Getting WISEr with Bioinformatics -2/4

I had a wonderful opportunity to learn about DNA sequencing and bioinformatics while attending the virtual nine week Waksman Institute Summer Experience program. We learned how to isolate the complementary DNA or cDNA fragments from the duckweed (Landoltia punctata) plant, using techniques like reverse transcriptase, plasmid vectors, polymerase chain reaction (PCR), gel electrophoresis etc. We used bioinformatics tools like BLAST, etc. to compare the sequenced DNA with known sequences stored in online repositories like GenBank. We also learned how to determine if the cDNA codes for a protein using the ORF Toolbox and look for similar proteins in other organisms. The duckweed plant grows on water in lakes and wetlands. It’s being researched as a potential food source and in bioremediation. With the help of the vWISE team, I submitted clones W418.20 (Landoltia punctata clone W418.20, 2021) and W417.20 (Landoltia punctata clone W417.20, 2021) to GenBank. The clone W418.20 codes for a protein that is similar to acid phosphatase/vanadium-dependent haloperoxidase-related protein.

Part two discusses the steps to prepare a cDNA sample taken from the cDNA library to prepare it to be sequenced. The length of the cDNA sample needs to be of a certain length in order to make the sequencing viable. Part one of this series showed the steps to create a cDNA library of random DNA fragments isolated from an organism.

Table of Contents

Prepare plasmid DNA from library for sequencing

Amplify the cDNA sample

The DNA fragment needed for analysis is very small. This needs to be amplified so that sequencing can be done. A technique called Polymerase Chain Reaction (PCR) is performed where a specific region of the cDNA sample is selectively replicated and thereby amplified using an enzme called Taq Polymerase. The double stranded DNA fragment, known as the DNA template is first denatured, that is, separated into single strands by heating it to 95C for 1 minute. The targeted region in the single strands are then annealed or attached to DNA primers by lowering the temperature to 50C for 2 minutes. These attached DNA primers are used as starters by the enzyme to build the complementary strand made of DNA nucleotide bases (dNTP) and extend the fragment. This stage is called the synthesizing stage, which occurs when the solution is heated to 72C for 2 minutes. These three steps – denaturing, annealing and synthesizing – make up the thermal cycling process which is used to make large amount of copies of the DNA fragment (Yourgenome, 2021).

The image below (taken from ThermoFisher Scientific) shows the annealing and extending steps of the Polymerase Chain Reaction process.

From the cDNA library, a colony of bacteria is selected grown in an overnight culture. The culture is centrifuged so that the bacterial cells are concentrated as a pellet. The pellet is mixed with a buffer P1, a resuspension buffer that degrades unwanted RNA and preserves the plasmid DNA inside the bacterial cells. A PCR reaction mix (this contains all the required items for the PCR process) is added to the plasmid DNA in tubes. The tubes are then placed in a thermal cycler which is a machine that raises and lowers the termperature of the tubes and its contents. This causes the amplification of the DNA fragment in the tubes. The video below, by Nadine Bongaerts, shows how the process is done in a lab.

Video of amplifying cDNA fragments from a culture using Polymerase Chain Reaction and thermal cyclers
(Nadine Bongaerts, Synthetic Biology One, 2017)

Identify size of the cDNA sample

The length of the DNA fragment determines if the sequencing will be effective or not. Gel electrophoresis is a technique that is used to find the lengths of the DNA fragment. The phosphate groups in the DNA gives it a net negative charge. This charge is used to pull the DNA molecules through an agarose (a polysaccharide made from sea weed) gel matrix when placed in an electric field ( Lee, Costumbrado, Hsu, Kim, 2012). Larger DNA molecules will travel shorter distances due to their size while smaller DNA molecules will travel longer distances. The distance traveled is inversely proportional to the log of the number of base pairs. Before placing the DNA fragments onto the matrix, the fragments are treated with dyes which lets the size of the DNA molecules to be measured visually. By comparing the distance traveled by the DNA molecules with known lengths, the size of the DNA fragment can be predicted. The lengths can be seen by exposing the dyed DNA fragments to ultra violet light thereby making the lengths visible in an image called the DNA ladder.

Shown below is an example of a DNA ladder. the left most lane “M” is the reference lane against which the entries in the other lanes are compared and the length calculated. For the plasmids used in the duckweed plant, the length of the DNA fragment had to be above 500 bp (base pairs) to make an effective sequence. The plasmids used were around 200 bp.

The video below from the University of Leicester shows the running of agarose gel electrophoresis experiment and visualizing the results using a DNA ladder.

Running an agarose gel and visualizing the results (University of Leicester)

Create miniprep of the cDNA sample for sequencing

Once the plasmid DNA is identified as having appropriate length for sequencing, the sample undergoes miniprepration steps to prepare it to be sequenced by DNA sequencers (Erickson, n.d.). Buffer P2, a lysis buffer is added to the plasmid DNA to lyse the bacteria to release the plasmid DNA. The lysate is then treated with buffer N3, a neutralization buffer which is used to separate the chromosomal DNA from the plasmid DNA. The result is then centrifuged and the supernatant (liquid) contains the plasmid DNA. The supernatant is separated and drained through a collection tube. The collection tube has silica resin to which the plasmid DNA will stick while other items will pass through. The content in the collection tube is then washed with buffer PE, a wash buffer which removes all other impurities and preserves the plasmid DNA. Finally the plasmid DNA is eluted or removed from the column using buffer EB, an elution buffer and transfered to a microcentrifuge tube, which is appropriately labeled and shipped to a sequencing lab.

Shown below a video from Aleks Nivina from Universite Paris Descartes on how to harvest Plasmid DNA with Minipreps.

How to harvest plasmid DNA with minipreps
(Aleks Nivina, Synthetic Biology One, 2017)

DNA Sequencers

Sequencing labs use different DNA sequencing technologies to sequence or identify the order of the nucleotides in the DNA fragments. Once the sequence is identified, further analysis can be conducted. Currently there are two types of sequencing technologies – Sanger sequencing and Next generation sequencing (ThermoFisher Scientific, n.d.). Sanger sequencing binds a short primer to the DNA fragment. Using a polymerase enyzme, the primers are extended by adding nucleotides that are complementary to ones in the attached DNA fragment. The process is then terminated or stopped with a nucleotide called dideoxynucleotide triphosphate or ddNTP (Schoales, 2015). The ddNTP can be either labeled using flourescent dyes or radioactive isotope markers. If using flourescent dyes, these extensions are passed through a glass capillary filled with gel in the presence of an electric field (this is called Sanger sequencing by Capillary Electrophoresis) and a detector to used identify the dye after a laser is shown on it. The data is then used to identify the base pairs using computer software, called base calls. Sanger sequencing is widely used and is cost effective for small DNA fragments. An example of a Sanger sequencer is Applied Biosystems SeqStudio Genetic Analyzer.

The video below from ThermoFisher explains sanger sequencing with some nice visual aids.

How does Sanger Sequencing Work?
(Thermo Fisher Scientific, 2015)

The drawback of automated sanger sequencing is that it reads small genes thereby taking a long time to read larger number of genes. Next generation sequencing works similar to sanger sequencing but it can run multiple sequencing process at the same time in a parallel fashion. This technology can sequence entire genomes faster. For example, it can sequence the entire human genome in a day whereas Sanger sequencing technology took over decade to create a draft of the human genome.

The video below explains how next generation sequencing works in the Ion Torrent Next-Generation Sequencer from Thermo Fisher Scientific. The sequencing concepts differs from the Sanger sequencing in that it breaks the DNA into smaller fragments and replicates these fragments onto small beads. The beads are placed in wells on a semiconductor chip and then treated treated with a soluion of one of four nucleotides. When one of the complementary nucleotides attaches to the fragment, it gives off a hydrogen ion which changes the pH level of the solution and the change in voltage is detected by the chip (Thermo Fisher Scientific, 2020). This is repeated for the other three nucleotides in many wells on a chip with multiple chips in the machine. Thus, large genomic regions are sequenced at the same time.

Ion Torrent Next-Generation Sequencing
(Thermo Fisher Scientific, 2020)

The sequencer outputs an electropherogram and a text file with the alphabetical DNA sequence based on the automated base calls. The electropherogram is a type of chromatogram which shows a color graph of peaks for each type of nucleotide. Sometimes the computer may not be accurate so a human would need to read the electropherogram to make base calls. Shown below is a chromatogram of one of the DNA fragments that I was studying. The chromatogram was generated from an Applied Biosystems 3730 Series Genetic Analyzer and viewed using the FinchTV chromatogram viewer.

Screenshot of part of the 20JM328.20-SP.ab1 chromatogram as seen in FinchTV of the Landoltia punctata clone W417.20
(Jaison, Vershon, Mead, 2021)

While this class was virtual due to the COVID pandemic, I worked on two chromatograms to validate the base calls and do further analysis using bioinformatics tools. Part 3 talks about the bioinformatics tools that I used as part of the class.

Landoltia punctata clone W418.20 acid phosphatase/vanadium-dependent haloperoxidase-related protein-like, mRNA sequence. Jaison,C., Vershon,A. and Mead,J., NCBI, May 18, 2021. Accession# JZ984547

Landoltia punctata clone W417.20, mRNA sequence. Jaison,C., Vershon,A. and Mead,J., NCBI, May 18, 2021. Accession# JZ984546

What is PCR (polymerase chain reaction)? (2021, July 21). Yourgenome.

Landoltia punctata clone W418.20 acid phosphatase/vanadium-dependent haloperoxidase-related protein-like, mRNA sequence. Jaison, C., Vershon, A. and Mead, J., NCBI, May 18, 2021. Accession# JZ984547

Landoltia punctata clone W417.20, mRNA sequence. Jaison, C., Vershon, A. and Mead, J., NCBI, May 18, 2021. Accession# JZ984546

Hoy, M. A. (2018). Insect Molecular Genetics. Elsevier Gezondheidszorg.

Shechter, D. (2019, November 21). What is the Purpose of Homogenization? BEE International.

Johnson M., Carpenter E., Tian Z., Bruskiewich R., Burris J., Carrigan C., et al. (2012) Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes. PLoS ONE 7(11): e50226.

KENNETH FRANCIS RODRIGUES -. (2020, June 26). Plant RNA Extraction with Qiagen RNEasy Kit and Subtitles [Video]. YouTube.

Purification of messenger RNA by affinity chromatography on CIMmultusTM Oligo dT column – BIA Separations. (2021). Sartorius BIA Separations.

Pray, L. (2008) The Biotechnology Revolution: PCR and the Use of Reverse Transcriptase to Clone Expressed Genes. Nature Education 1(1):94

Bacterial Transformation and Competent Cells–A Brief Introduction | Thermo Fisher Scientific – NL. (n.d.). Https://Www.Thermofisher.Com/. Retrieved June 7, 2021, from

Blogger, A. G. (2019, October 7). X-GAL: Cloning, Protein-protein Interactions, and Water Testing for E. coli. AG Scientific Blog.

Synthetic Biology One, & Bongaerts, N. (2017, September 8). How to Screen Bacterial Colonies with PCR. YouTube.

Lee, P. Y., Costumbrado, J., Hsu, C. Y., & Kim, Y. H. (2012). Agarose gel electrophoresis for the separation of DNA fragments. Journal of visualized experiments : JoVE, (62), 3923.

Running an Agarose Gel – University of Leicester. (2009, June 26). YouTube.

Erickson, F. L. (n.d.). Qiagen Plasmid Miniprep Kit Protocol. Https://Www.Salisbury.Edu/. Retrieved August 19, 2021, from

How to Harvest Plasmid DNA with Minipreps. (2017, March 13). YouTube.

What are the different types of DNA sequencing technologies? | Thermo Fisher Scientific – NL. (n.d.). ThermoFisher Scientific. Retrieved August 19, 2021, from

Schoales, J. (2015, June 17). How Does Sanger Sequencing Work? Behind the Bench.

Thermo Fisher Scientific. (2015, June 17). How does Sanger Sequencing Work? – Seq It Out #1. YouTube.

Thermo Fisher Scientific. (2020, May 21). Ion Torrent Next-generation Sequencing. YouTube.

Image designed by pikisuperstar / Freepik



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s