What does primary RNA transcript mean and how is it different than the mRNA sequence?

Structural Considerations

The first step in gene expression is transcription of the genetic information in DNA into RNA. The individual building blocks of RNA, ribonucleotides, have the same structure as the deoxyribonucleotides in DNA, except that (1) the 2′ carbon of the ribose sugar is substituted with an OH group instead of H, and (2) there are no thymine bases in RNA, only uracil (demethylated thymine), which also pairs with adenine by hydrogen bonding. Just like the DNA polymerases described above, the enzyme RNA polymerase II uses the nucleotide sequence of the gene's DNA as a template to form a polymer of ribonucleotides with a sequence complementary to the DNA template.

For transcription to be “correct,” RNA polymerase II must (1) use the antisense strand of DNA as a template, (2) begin transcription at the start of the gene, and (3) end transcription at the end of the gene. The signals that ensure correct transcription are provided to the RNA polymerase II by DNA in the form of specific nucleotide sequences in the promoter of the gene. After reading and interpreting these signals, the RNA polymerase generates a primary RNA transcript that extends from the initiation site to the termination site in a perfect complementary match to the DNA sequence used as a template. However, not all transcribed RNA is destined to arrive in the cytoplasm as mRNA. Rather, by an incompletely understood process, sequences complementary to introns (see above) are excised from the primary transcript, and the ends of exon sequences are joined together in a process termed splicing.45

In addition to splicing, the primary transcript is further modified by the addition of a methylated GTP “cap” at the 5′ end,46 and the addition of a stretch of anywhere from 20 to 40 A bases at the 3′ end.47 These modifications appear to promote the “translatability”48,49 and relative stability of mRNAs and help direct the subcellular localization of mRNAs destined for translation.

Northern Blotting

The fundamental question in the analysis of gene expression at the RNA level is whether RNA sequences derived from a gene of interest are present in cells or tissues. Detecting specific RNA sequences can be accomplished by Northern blotting, the whimsically named analog of Southern blotting, when applied to RNA analysis. RNA can be isolated from cells in its intact form, free from significant amounts of DNA.50 Messenger RNA is much smaller than genomic DNA, so it can be analyzed by agarose gel electrophoresis without the enzymatic digestion steps that are necessary for the analysis of high-molecular-weight DNA.

RNA is single stranded and has a tendency to fold back on itself. This allows complementary bases on the same stretch of RNA to base pair with each other and form what is termed secondary structure. Because secondary structure can lead to aberrant electrophoretic behavior, RNA is electrophoretically separated by size in the presence of a denaturing agent, such as formaldehyde or glyoxal/DMSO (dimethyl sulfoxide). After electrophoresis through a denaturing agarose gel, the RNA is transferred to a nitrocellulose or nylon-based membrane in the same manner as DNA for Southern blotting (see Figure 2-4). Hybridization schemes and blot washing are essentially the same for Northern blotting as for Southern blotting. In this manner, specific RNA sequences corresponding to those in cloned DNA probes can easily be identified.

There is a lower limit to the sensitivity of Northern blotting, so that only moderately abundant mRNAs can be detected using this technique. One way to increase the sensitivity of Northern blotting is to enrich the RNA preparation for mRNA. Ordinarily, mRNA makes up less than 10% of the total RNA content of a cell or tissue. When RNA is isolated from these sources, all RNA species are being isolated, that is, ribosomal and transfer RNA as well as mRNA. As noted above, most mRNAs destined for the cytoplasm and translation are modified by the addition of a 3′ poly(A) tract. An RNA preparation can, therefore, be greatly enriched for mRNA species by removing all RNA molecules that lack the 3′ poly(A) tail.51 This can be done by exposing the RNA preparation to a tract of poly(U) or poly(T) bound to an immobilized support, such as a plastic bead. The poly(A) portion of mRNA will bind to the poly(U) or poly(T) material, and non-poly(A)-containing RNA can be washed away. After washing, the poly(A)-containing mRNA can be recovered from the solid support and used in Northern blot analysis. This procedure improves the sensitivity of Northern blotting by nearly two orders of magnitude.

A dramatic use of Northern blotting in cancer research has been the demonstration of oncogene expression in some human tumors. RNA was isolated from human tumor samples and analyzed by Northern blotting using cloned DNA probes derived from various oncogenes. The earliest observations included expression of c-abl and c-myc in human tumor cell lines and leukemic blasts.52,53 Since then, however, a large number of protooncogenes have been shown to be transcribed in primary human tumor tissue.

cDNA

The flow of genetic information usually runs from DNA to RNA to protein, according to the so-called central dogma of molecular biology. There are, however, exceptions to this rule, the most prominent of which involves the life cycle of retroviruses. These viruses encode their genetic information in RNA rather than DNA. When they invade a susceptible host cell, they direct the synthesis of a DNA intermediate that is a complementary copy of their genomic RNA. The enzyme that accomplishes this task, reverse transcriptase, is a DNA polymerase (see above) that uses RNA, rather than DNA, as a template to form a cDNA copy of the RNA.54,55 This enzyme can be used in vitro to make cDNA copies of any available RNA.

One important application of cDNA synthesis is the construction of cDNA libraries, which are basically gene libraries consisting only of the genes that are expressed in a cell or tissue of interest.56,57 Most of the time, one is really not concerned with all the DNA in the genome, for example, intron sequences, promoters, and vast regions of “uninformative” DNA that lie between genes. Furthermore, if one were interested in analyzing the genes expressed in a brain cell, why bother making a library that contained sequences for the β-globin gene? One way to construct a library comprising only tissue-specific expressed genes would be to clone all the mRNA in a specific cell or tissue of interest. Unfortunately, there is no way to ligate single-stranded RNA to a double-stranded DNA cloning vector. However, one can use all the mRNA in a cell as a template for making double-stranded cDNA, which can then be inserted into a cloning vector.

To make a cDNA library, one isolates all the mRNA from a cell or tissue. Then, using this mRNA as a template, reverse transcriptase makes cDNA copies of each mRNA molecule in the mixture. The cDNA is ligated into a plasmid or phage vector as described above (Figure 2-3) and the recombinant vectors are introduced into bacteria. After growth on agar plates, each bacterial colony or phage plaque of a cDNA library houses a unique recombinant vector containing the cDNA copy of a single mRNA. Desired clones can be detected by nucleic acid hybridization to the plaques or colonies using a radiolabeled gene probe.58,59 Alternatively, if the vector containing the cDNA molecules can direct transcription of mRNA by host bacterial cells, mRNA will be synthesized, and that mRNA will be translated. In this case, each bacterial colony or plaque will produce a different protein, and each protein will have been encoded by an mRNA from the original cell or tissue being investigated. If an antibody directed against a protein of interest is available, the cDNA clone corresponding to the mRNA that encodes that protein can be identified by binding the antibody to the colonies or plaques of the cDNA library.60 This technique, called expression cloning, often employs the bacteriophage λgt11 as the cloning vector.

cDNA libraries can be used to clone cDNA for a known gene to discover the sequence of the mRNA it encodes. One application of this is the generation of ESTs (expressed sequence tags) databases by sequencing clones of various cDNA libraries. Alternatively, cDNA libraries can also be used to identify previously unknown genes. In a process called differential screening, cDNAs can be discovered that owe their existence to a particular differentiation or activation state in the cell of origin. For example, this technique has been used to identify genes whose expression is turned on by hormones or by growth factors.61

Serial Analysis of Gene Expression

The most comprehensive way to display a unique pattern of gene expression that determines the identity of a cell or tissue would be to construct a cDNA library from it and sequence every clone. This is obviously an impossible task. Rather, a technique called serial analysis of gene expression (SAGE) achieves the same end in a practical manner. In SAGE, the investigator sequences a small and unique fragment of each expressed gene (called a SAGE tag) and quantifies the number of times it appears (called the SAGE tag number). The SAGE tag numbers, therefore, directly reflect the abundance of the corresponding transcript.

The sensitivity and the quantitative accuracy of SAGE are theoretically unlimited. The generation of a SAGE library does not require any prior knowledge of what genes are expressed in the cell of interest. Therefore, SAGE is able to detect and quantify the expression of previously uncharacterized genes. SAGE is based on two fundamental principles:

1.

A short (10 to 11 base pair [bp]) oligonucleotide fragment (SAGE tag) is sufficient to uniquely identify a specific mRNA transcript or its cognate cDNA. A 10- bp oligonucleotide sequence has a complexity of 410 different combinations. Because there are only approximately 30,000 to 120,000 genes encoded by the human genome, a 10-bp sequence tag corresponding to a defined position of a cDNA is sufficient to uniquely identify any transcribed human gene.

2.

Multiple 10-bp SAGE tags can be concatenated in a single plasmid, thereby greatly compressing the number of actual plasmid preparations and DNA sequencing reactions that are required to analyze a large number of genes. In practice, a single sequencing reaction can provide information on 30 to 35 different SAGE tags, and therefore 30 to 35 different cDNAs.

The generation of a SAGE library is a technically challenging multistep procedure that has been described in detail.62 Figure 2-9 outlines the essence of the method.

What does primary RNA transcript mean and how is it different than the mRNA sequence?

Figure 2-9

Construction and analysis of SAGE libraries. In step 1, a cDNA library is constructed from the cells or tissue of interest, and the cDNAs are immobilized on magnetic beads at their 3′ ends. In step 2, the cDNAs are subjected to restriction enzyme (more...)

SAGE has been used to characterize the yeast transcriptome, monitor alterations in gene expression patterns following ionizing radiation, during apoptosis induced by the p53, and the adenomatous polyposis coli tumor-suppressor proteins.63–66 In all of these cases, the ability to measure the expression levels of thousands of different transcripts simultaneously was extremely useful for the understanding of the underlying physiologic processes. The application of SAGE to the comparison of the expression profiles of normal and tumor tissues is probably the most attractive one, because by comparing the expression profiles of normal and cancer cells in a comprehensive way, it is possible to identify genes or subsets of genes that could be used as potential diagnostic/prognostic markers or therapeutic targets.67–70 SAGE is one of the techniques used by the National Cancer Institute funded Cancer Gene Anatomy Project (CGAP).71 A goal of this project is to create a catalog of genes expressed in various normal and cancerous tissue types. To date, more than 120 different SAGE libraries have been deposited on the National Center for Biotechnology Education/ CGAP SAGEmap Website (http://www.ncbi.nlm.nih.gov/SAGE), which is now the largest source of public SAGE data.71,72

DNA Microarray Analysis

Another approach to comparative gene expression profiling employs the use of DNA microarrays, often referred to as DNA “chips.” Two basic types of DNA microarrays are currently available: oligonucleotide arrays73,74 and cDNA arrays.75,76 Both approaches involve the immobilization of DNA sequences in a gridded array on the surface of a solid support, such as a glass microscope slide or silicon wafer. In the case of oligonucleotide arrays, 25-nucleotide-long fragments of known DNA sequence are synthesized in situ on the surface of the chip by using a series of light-directed coupling reactions similar to photolithography. By using this method, as many as 400,000 distinct sequences representing over 18,000 genes can be synthesized on a single 1.3- × 1.3-cm microarray. In the case of cDNA microarrays, cDNA fragments are deposited onto the surface of a glass slide using a robotic spotting device. For both microarray approaches, the next step involves the purification of RNA from the source of interest (eg, from a tumor), enzymatic fluorescent labeling of the RNA, and hybridization of the fluorescently labeled material to the microarray. Hybridization events are then captured by scanning the surface of the microarray with a laser scanning device and measuring the fluorescence intensity at each position in the microarray. The fluorescence intensity of each spot on the array is proportional to the level of expression of the gene represented by that spot. This process is illustrated in Figure 2-10.

What does primary RNA transcript mean and how is it different than the mRNA sequence?

Figure 2-10

DNA microarray analysis. In this example, RNA extracted from a tumor is end-labeled with a fluorescent marker, then allowed to hybridize to a chip derivatized with cDNAs or oligonucleotides as described in the text. The precise location of RNA hybridization (more...)

DNA microarray technology is evolving rapidly, with improvements in miniaturization, reproducibility, production capabilities, and the development of alternative approaches to microarray synthesis. The application of gene expression profiling methods to important questions in biology and medicine is also emerging. The ability to monitor the expression levels of thousands of genes simultaneously offers the potential opportunity to expand the analysis of cancer genetics beyond single-candidate gene approaches, toward considering genetic networks. It is becoming increasingly clear that while some tumors appear to be caused by mutations in a single gene (eg, oncogene or tumor-suppressor gene), most cancers likely arise through the collaboration of multiple genes, none of which, when considered alone, are sufficient for transformation. Until recently, the analysis of such genetic networks has been impractical, in that methods for measuring the expression levels of multiple genes in parallel have not been available. The development of DNA microarrays may, in large part, have solved this problem. Microarrays capable of monitoring the expression levels of the entire human genome (estimated to contain approximately 30,000 to 120,000 genes) are likely to become available in the near future.

Microarray analysis has proven to be a powerful method for the analysis of gene expression patterns in human cancer and for cancer classification. Gene expression profiles have been used for class prediction, or determining which samples belong to which tumor class, and for class discovery of new tumor types. The first proof-of-principle for gene expression analysis in cancer was the demonstration that acute myeloid leukemias and acute lymphoid leukemias could be accurately distinguished.77 Since then, new cancer classes have been discovered in leukemias,78 lymphomas,79,80 brain cancers,81 breast cancer,82 prostate cancers,83–85 and lung cancers,86,87 among others. Although gene expression analysis of cancer has not yet been used for clinical diagnosis, it is likely that such applications will develop during the next several years.

The challenge now is not so much how to generate complex gene expression data, but rather how to interpret it. The key is to develop methods for recognizing meaningful gene expression patterns and distinguishing those patterns from noise. Such noise (random gene expression levels) can be generated by (1) variability among microarrays, (2) variability in RNA labeling and hybridization methods, and, perhaps most importantly, (3) biologic variability among samples. It is likely that all of the above sources of variability are significant. It has become clear that the successful elucidation of genetic networks through expression profiling will require the expertise of a new generation of scientists, namely, computational biologists. Improvements in DNA microarray fabrication will only become valuable if pattern recognition algorithms are similarly developed. Nonetheless, it is likely that the future of cancer diagnostics will include the analysis of gene expression profiles which might help guide treatment planning of individual patients.

Polymerase Chain Reaction

Another important use of cDNA technology has allowed PCR to be applied to RNA. Because the Taq polymerase is a DNA polymerase (see above), it cannot use RNA as a template. Simply adding primers and Taq polymerase to an RNA preparation will not result in amplification. However, if an RNA of interest could be made into DNA, then PCR would proceed as usual.

The first step in this analysis is generating a cDNA copy of the mRNA of interest using reverse transcriptase. This can be done using a primer consisting of Ts (complementary to the poly(A) tail) or of a sequence complementary to some portion of the 3′ region of the mRNA. The 5′ primer can then be added along with Taq polymerase, and the single-stranded cDNA made in the first step will be amplified as described above (see Figure 2-7). In one of the first applications of this technique, Ph'-positive leukemias were diagnosed by identifying chimeric bcr-abl mRNA species in clinical material using PCR. Since then, so-called reverse transcriptase (RT-PCR) PCR has come into widespread use.88

One inherent problem in using RT-PCR to monitor mRNA expression is quantitation of the amplified PCR products. In Northern blotting or nuclease protection analysis, the intensity of the hybridization signal is directly proportional to the amount of target RNA in the sample. Thus, one can compare the number of RNA molecules in one sample with another. With PCR, a slight change in the efficiency of polymerization in an early cycle in one sample will lead to a geometrically increasing discrepancy between the amount of amplified product in that sample compared with another sample, and the amounts of PCR product when the reaction reaches saturation can also differ significantly. Fortunately, a number of techniques have been described for normalizing the products of PCR reactions to allow quantitative comparisons. Most notably, quantitative real-time PCR89 is a method for continuous monitoring of amplification. This method makes quantitative comparisons of amplifications during the unbiased linear range where each cycle gives a constant increase in amplification. In brief, quantitative real-time PCR takes advantage of a fluorogenic probe within the amplified region, containing a fluorescent tag on one end and a quencher on the other end. Amplification also leads to digestion of the probe, now liberating a free fluorescent molecule; the increase in fluorescence is proportional to the amplification.

Ribozymes and RNA Interference

Some RNA molecules, in addition to proteins, have enzymatic activity. These RNAs, called ribozymes, can cleave RNA at sequence-specific sites.90 They were originally discovered in Tetrahymena, when it appeared that some of the primary RNA molecules in that species were capable of splicing out their introns without the aid of any protein enzymes. Ribozymes were recently described in higher organisms, and it is likely that they will be found to play a universal and important role in RNA processing. Sequence-specific ribozymes that will destroy specific mRNAs can be synthesized. One application of this technology is the introduction into malignant cells of ribozymes directed against activated oncogenes.

A different method to disrupt cellular RNA molecules is RNA interference. In RNA interference, double-stranded (RNAi) molecules can lead to the degradation of specific cellular messenger RNA species.91 This method has now been applied to mammalian cells, using 21 base pair RNA duplexes92 or plasmids permitting stable expression of RNA interference.93

Summary

The genetic information in DNA is copied, or transcribed, into mRNA by the enzyme RNA polymerase II. Before being transported to the cytoplasm, primary transcripts in the nucleus are modified by splicing out introns, adding a 5′ cap and adding a 3′ poly(A) tract. Cytoplasmic mRNA can be detected by Northern blotting, nuclease protection assays, or by modified PCR. Although nuclease protection assays are technically somewhat more demanding than Northern blotting, they are more sensitive and can provide structural information about mRNA transcripts. A retroviral enzyme called reverse transcriptase can make cDNA copies of mRNA transcripts. These cDNAs can be cloned into cDNA libraries, which are useful for isolating and analyzing expressed genes. In the future, ribozymes and/or RNA interference may be useful for the selective elimination of specific mRNA species.

What is the difference between primary transcript and mRNA?

The initial product of transcription of a protein coding gene is called the pre-mRNA (or primary transcript). After it has been processed and is ready to be exported from the nucleus, it is called the mature mRNA or processed mRNA.

What is the difference between primary RNA and mRNA?

The main difference between RNA and mRNA is that RNA is the product of the transcription of genes in the genome whereas mRNA is the processed product of RNA during post transcriptional modifications and serves as the template to produce a particular amino acid sequence during translation in ribosomes.

Is mRNA and RNA transcript the same?

Messenger RNA (mRNA) molecules carry the coding sequences for protein synthesis and are called transcripts; ribosomal RNA (rRNA) molecules form the core of a cell's ribosomes (the structures in which protein synthesis takes place); and transfer RNA (tRNA) molecules carry amino acids to the ribosomes during protein ...

How does a primary transcript different from mRNA molecule that leaves the nucleus?

The primary transcript is much longer than mature mRNA because of the presence of introns in the former.