Describe two ways in which rna differs from dna

Transcription and translation are the means by which cells read out, or express, the genetic instructions in their genes. Because many identical RNA copies can be made from the same gene, and each RNA molecule can direct the synthesis of many identical protein molecules, cells can synthesize a large amount of protein rapidly when necessary. But each gene can also be transcribed and translated with a different efficiency, allowing the cell to make vast quantities of some proteins and tiny quantities of others (Figure 6-3). Moreover, as we see in the next chapter, a cell can change (or regulate) the expression of each of its genes according to the needs of the moment—most obviously by controlling the production of its RNA.

The first step a cell takes in reading out a needed part of its genetic instructions is to copy a particular portion of its DNA nucleotide sequence—a gene—into an RNA nucleotide sequence. The information in RNA, although copied into another chemical form, is still written in essentially the same language as it is in DNA—the language of a nucleotide sequence. Hence the name transcription.

Like DNA, RNA is a linear polymer made of four different types of nucleotide subunits linked together by phosphodiester bonds (Figure 6-4). It differs from DNA chemically in two respects: (1) the nucleotides in RNA are ribonucleotides—that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains the base uracil (U) instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen-bonding with A (Figure 6-5), the complementary base-pairing properties described for DNA in Chapters 4 and 5 apply also to RNA (in RNA, G pairs with C, and A pairs with U). It is not uncommon, however, to find other types of base pairs in RNA: for example, G pairing with U occasionally.

Despite these small chemical differences, DNA and RNA differ quite dramatically in overall structure. Whereas DNA always occurs in cells as a double-stranded helix, RNA is single-stranded. RNA chains therefore fold up into a variety of shapes, just as a polypeptide chain folds up to form the final shape of a protein (Figure 6-6). As we see later in this chapter, the ability to fold into complex three-dimensional shapes allows some RNA molecules to have structural and catalytic functions.

All of the RNA in a cell is made by DNA transcription, a process that has certain similarities to the process of DNA replication discussed in Chapter 5. Transcription begins with the opening and unwinding of a small portion of the DNA double helix to expose the bases on each DNA strand. One of the two strands of the DNA double helix then acts as a template for the synthesis of an RNA molecule. As in DNA replication, the nucleotide sequence of the RNA chain is determined by the complementary base-pairing between incoming nucleotides and the DNA template. When a good match is made, the incoming ribonucleotide is covalently linked to the growing RNA chain in an enzymatically catalyzed reaction. The RNA chain produced by transcription—the transcript—is therefore elongated one nucleotide at a time, and it has a nucleotide sequence that is exactly complementary to the strand of DNA used as the template (Figure 6-7).

Transcription, however, differs from DNA replication in several crucial ways. Unlike a newly formed DNA strand, the RNA strand does not remain hydrogen-bonded to the DNA template strand. Instead, just behind the region where the ribonucleotides are being added, the RNA chain is displaced and the DNA helix re-forms. Thus, the RNA molecules produced by transcription are released from the DNA template as single strands. In addition, because they are copied from only a limited region of the DNA, RNA molecules are much shorter than DNA molecules. A DNA molecule in a human chromosome can be up to 250 million nucleotide-pairs long; in contrast, most RNAs are no more than a few thousand nucleotides long, and many are considerably shorter.

The enzymes that perform transcription are called RNA polymerases. Like the DNA polymerase that catalyzes DNA replication (discussed in Chapter 5), RNA polymerases catalyze the formation of the phosphodiester bonds that link the nucleotides together to form a linear chain. The RNA polymerase moves stepwise along the DNA, unwinding the DNA helix just ahead of the active site for polymerization to expose a new region of the template strand for complementary base-pairing. In this way, the growing RNA chain is extended by one nucleotide at a time in the 5′-to-3′ direction (Figure 6-8). The substrates are nucleoside triphosphates (ATP, CTP, UTP, and GTP); as for DNA replication, a hydrolysis of high-energy bonds provides the energy needed to drive the reaction forward (see Figure 5-4).

The almost immediate release of the RNA strand from the DNA as it is synthesized means that many RNA copies can be made from the same gene in a relatively short time, the synthesis of additional RNA molecules being started before the first RNA is completed (Figure 6-9). When RNA polymerase molecules follow hard on each other's heels in this way, each moving at about 20 nucleotides per second (the speed in eucaryotes), over a thousand transcripts can be synthesized in an hour from a single gene.

Although RNA polymerase catalyzes essentially the same chemical reaction as DNA polymerase, there are some important differences between the two enzymes. First, and most obvious, RNA polymerase catalyzes the linkage of ribonucleotides, not deoxyribonucleotides. Second, unlike the DNA polymerases involved in DNA replication, RNA polymerases can start an RNA chain without a primer. This difference may exist because transcription need not be as accurate as DNA replication (see Table 5-1, p. 243). Unlike DNA, RNA does not permanently store genetic information in cells. RNA polymerases make about one mistake for every 104 nucleotides copied into RNA (compared with an error rate for direct copying by DNA polymerase of about one in 107 nucleotides), and the consequences of an error in RNA transcription are much less significant than that in DNA replication.

Although RNA polymerases are not nearly as accurate as the DNA polymerases that replicate DNA, they nonetheless have a modest proofreading mechanism. If the incorrect ribonucleotide is added to the growing RNA chain, the polymerase can back up, and the active site of the enzyme can perform an excision reaction that mimics the reverse of the polymerization reaction, except that water instead of pyrophosphate is used (see Figure 5-4). RNA polymerase hovers around a misincorporated ribonucleotide longer than it does for a correct addition, causing excision to be favored for incorrect nucleotides. However, RNA polymerase also excises many correct bases as part of the cost for improved accuracy.

The majority of genes carried in a cell's DNA specify the amino acid sequence of proteins; the RNA molecules that are copied from these genes (which ultimately direct the synthesis of proteins) are called messenger RNA (mRNA) molecules. The final product of a minority of genes, however, is the RNA itself. Careful analysis of the complete DNA sequence of the genome of the yeast S. cerevisiae has uncovered well over 750 genes (somewhat more than 10% of the total number of yeast genes) that produce RNA as their final product, although this number includes multiple copies of some highly repeated genes. These RNAs, like proteins, serve as enzymatic and structural components for a wide variety of processes in the cell. In Chapter 5 we encountered one of those RNAs, the template carried by the enzyme telomerase. Although not all of their functions are known, we see in this chapter that some small nuclear RNA (snRNA) molecules direct the splicing of pre-mRNA to form mRNA, that ribosomal RNA (rRNA) molecules form the core of ribosomes, and that transfer RNA (tRNA) molecules form the adaptors that select amino acids and hold them in place on a ribosome for incorporation into protein (Table 6-1).

Each transcribed segment of DNA is called a transcription unit. In eucaryotes, a transcription unit typically carries the information of just one gene, and therefore codes for either a single RNA molecule or a single protein (or group of related proteins if the initial RNA transcript is spliced in more than one way to produce different mRNAs). In bacteria, a set of adjacent genes is often trans-cribed as a unit; the resulting mRNA molecule therefore carries the information for several distinct proteins.

Overall, RNA makes up a few percent of a cell's dry weight. Most of the RNA in cells is rRNA; mRNA comprises only 3–5% of the total RNA in a typical mammalian cell. The mRNA population is made up of tens of thousands of different species, and there are on average only 10–15 molecules of each species of mRNA present in each cell.

To transcribe a gene accurately, RNA polymerase must recognize where on the genome to start and where to finish. The way in which RNA polymerases perform these tasks differs somewhat between bacteria and eucaryotes. Because the process in bacteria is simpler, we look there first.

The initiation of transcription is an especially important step in gene expression because it is the main point at which the cell regulates which proteins are to be produced and at what rate. Bacterial RNA polymerase is a multisubunit complex. A detachable subunit, called sigma (σ) factor, is largely responsible for its ability to read the signals in the DNA that tell it where to begin transcribing (Figure 6-10). RNA polymerase molecules adhere only weakly to the bacterial DNA when they collide with it, and a polymerase molecule typically slides rapidly along the long DNA molecule until it dissociates again. However, when the polymerase slides into a region on the DNA double helix called a promoter, a special sequence of nucleotides indicating the starting point for RNA synthesis, it binds tightly to it. The polymerase, using its σ factor, recognizes this DNA sequence by making specific contacts with the portions of the bases that are exposed on the outside of the helix (Step 1 in Figure 6-10).

After the RNA polymerase binds tightly to the promoter DNA in this way, it opens up the double helix to expose a short stretch of nucleotides on each strand (Step 2 in Figure 6-10). Unlike a DNA helicase reaction (see Figure 5-15), this limited opening of the helix does not require the energy of ATP hydrolysis. Instead, the polymerase and DNA both undergo reversible structural changes that result in a more energetically favorable state. With the DNA unwound, one of the two exposed DNA strands acts as a template for complementary base-pairing with incoming ribonucleotides (see Figure 6-7), two of which are joined together by the polymerase to begin an RNA chain. After the first ten or so nucleotides of RNA have been synthesized (a relatively inefficient process during which polymerase synthesizes and discards short nucleotide oligomers), the σ factor relaxes its tight hold on the polymerase and evenutally dissociates from it. During this process, the polymerase undergoes additional structural changes that enable it to move forward rapidly, transcribing without the σ factor (Step 4 in Figure 6-10). Chain elongation continues (at a speed of approximately 50 nucleotides/sec for bacterial RNA polymerases) until the enzyme encounters a second signal in the DNA, the terminator (described below), where the polymerase halts and releases both the DNA template and the newly made RNA chain (Step 7 in Figure 6-10). After the polymerase has been released at a terminator, it reassociates with a free σ factor and searches for a new promoter, where it can begin the process of transcription again.

Several structural features of bacterial RNA polymerase make it particularly adept at performing the transcription cycle just described. Once the σ factor positions the polymerase on the promoter and the template DNA has been unwound and pushed to the active site, a pair of moveable jaws is thought to clamp onto the DNA (Figure 6-11). When the first 10 nucleotides have been transcribed, the dissociation of σ allows a flap at the back of the polymerase to close to form an exit tunnel through which the newly made RNA leaves the enzyme. With the polymerase now functioning in its elongation mode, a rudder-like structure in the enzyme continuously pries apart the DNA-RNA hybrid formed. We can view the series of conformational changes that takes place during transcription initiation as a successive tightening of the enzyme around the DNA and RNA to ensure that it does not dissociate before it has finished transcribing a gene. If an RNA polymerase does dissociate prematurely, it cannot resume synthesis but must start over again at the promoter.

How do the signals in the DNA (termination signals) stop the elongating polymerase? For most bacterial genes a termination signal consists of a string of A-T nucleotide pairs preceded by a two-fold symmetric DNA sequence, which, when transcribed into RNA, folds into a “hairpin” structure through Watson-Crick base-pairing (see Figure 6-10). As the polymerase transcribes across a terminator, the hairpin may help to wedge open the movable flap on the RNA polymerase and release the RNA transcript from the exit tunnel. At the same time, the DNA-RNA hybrid in the active site, which is held together predominantly by U-A base pairs (which are less stable than G-C base pairs because they form two rather than three hydrogen bonds per base pair), is not sufficiently strong enough to hold the RNA in place, and it dissociates causing the release of the polymerase from the DNA, perhaps by forcing open its jaws. Thus, in some respects, transcription termination seems to involve a reversal of the structural transitions that happen during initiation. The process of termination also is an example of a common theme in this chapter: the ability of RNA to fold into specific structures figures prominantly in many aspects of decoding the genome.

As we have just seen, the processes of transcription initiation and termination involve a complicated series of structural transitions in protein, DNA, and RNA molecules. It is perhaps not surprising that the signals encoded in DNA that specify these transitions are difficult for researchers to recognize. Indeed, a comparison of many different bacterial promoters reveals that they are heterogeneous in DNA sequence. Nevertheless, they all contain related sequences, reflecting in part aspects of the DNA that are recognized directly by the σ factor. These common features are often summarized in the form of a consensus sequence (Figure 6-12). In general, a consensus nucleotide sequence is derived by comparing many sequences with the same basic function and tallying up the most common nucleotide found at each position. It therefore serves as a summary or “average” of a large number of individual nucleotide sequences.

One reason that individual bacterial promoters differ in DNA sequence is that the precise sequence determines the strength (or number of initiation events per unit time) of the promoter. Evolutionary processes have thus fine-tuned each promoter to initiate as often as necessary and have created a wide spectrum of promoters. Promoters for genes that code for abundant proteins are much stronger than those associated with genes that encode rare proteins, and their nucleotide sequences are responsible for these differences.

Like bacterial promoters, transcription terminators also include a wide range of sequences, with the potential to form a simple RNA structure being the most important common feature. Since an almost unlimited number of nucleotide sequences have this potential, terminator sequences are much more heterogeneous than those of promoters.

We have discussed bacterial promoters and terminators in some detail to illustrate an important point regarding the analysis of genome sequences. Although we know a great deal about bacterial promoters and terminators and can develop consensus sequences that summarize their most salient features, their variation in nucleotide sequence makes it difficult for researchers (even when aided by powerful computers) to definitively locate them simply by inspection of the nucleotide sequence of a genome. When we encounter analogous types of sequences in eucaryotes, the problem of locating them is even more difficult. Often, additional information, some of it from direct experimentation, is needed to accurately locate the short DNA signals contained in genomes.

Promoter sequences are asymmetric (see Figure 6-12), and this feature has important consequences for their arrangement in genomes. Since DNA is double-stranded, two different RNA molecules could in principle be transcribed from any gene, using each of the two DNA strands as a template. However a gene typically has only a single promoter, and because the nucleotide sequences of bacterial (as well as eucaryotic) promoters are asymmetric the polymerase can bind in only one orientation. The polymerase thus has no option but to transcribe the one DNA strand, since it can synthesize RNA only in the 5′ to 3′ direction (Figure 6-13). The choice of template strand for each gene is therefore determined by the location and orientation of the promoter. Genome sequences reveal that the DNA strand used as the template for RNA synthesis varies from gene to gene (Figure 6-14; see also Figure 1-31).

Having considered transcription in bacteria, we now turn to the situation in eucaryotes, where the synthesis of RNA molecules is a much more elaborate affair.

In contrast to bacteria, which contain a single type of RNA polymerase, eucaryotic nuclei have three, called RNA polymerase I, RNA polymerase II, and RNA polymerase III. The three polymerases are structurally similar to one another (and to the bacterial enzyme). They share some common subunits and many structural features, but they transcribe different types of genes (Table 6-2). RNA polymerases I and III transcribe the genes encoding transfer RNA, ribosomal RNA, and various small RNAs. RNA polymerase II transcribes the vast majority of genes, including all those that encode proteins, and our subsequent discussion therefore focuses on this enzyme.

Although eucaryotic RNA polymerase II has many structural similarities to bacterial RNA polymerase (Figure 6-15), there are several important differences in the way in which the bacterial and eucaryotic enzymes function, two of which concern us immediately.

1.

While bacterial RNA polymerase (with σ factor as one of its subunits) is able to initiate transcription on a DNA template in vitro without the help of additional proteins, eucaryotic RNA polymerases cannot. They require the help of a large set of proteins called general transcription factors, which must assemble at the promoter with the polymerase before the polymerase can begin transcription.

2.

Eucaryotic transcription initiation must deal with the packing of DNA into nucleosomes and higher order forms of chromatin structure, features absent from bacterial chromosomes.

The discovery that, unlike bacterial RNA polymerase, purified eucaryotic RNA polymerase II could not initiate transcription in vitro led to the discovery and purification of the additional factors required for this process. These general transcription factors help to position the RNA polymerase correctly at the promoter, aid in pulling apart the two strands of DNA to allow transcription to begin, and release RNA polymerase from the promoter into the elongation mode once transcription has begun. The proteins are “general” because they assemble on all promoters used by RNA polymerase II; consisting of a set of interacting proteins, they are designated as TFII (for transcription factor for polymerase II), and listed as TFIIA, TFIIB, and so on. In a broad sense, the eucaryotic general transcription factors carry out functions equivalent to those of the σ factor in bacteria.

Figure 6-16 shows how the general transcription factors assemble in vitro at promoters used by RNA polymerase II. The assembly process starts with the binding of the general transcription factor TFIID to a short double-helical DNA sequence primarily composed of T and A nucleotides. For this reason, this sequence is known as the TATA sequence, or TATA box, and the subunit of TFIID that recognizes it is called TBP (for TATA-binding protein). The TATA box is typically located 25 nucleotides upstream from the transcription start site. It is not the only DNA sequence that signals the start of transcription (Figure 6-17), but for most polymerase II promoters, it is the most important. The binding of TFIID causes a large distortion in the DNA of the TATA box (Figure 6-18). This distortion is thought to serve as a physical landmark for the location of an active promoter in the midst of a very large genome, and it brings DNA sequences on both sides of the distortion together to allow for subsequent protein assembly steps. Other factors are then assembled, along with RNA polymerase II, to form a complete transcription initiation complex (see Figure 6-16).

After RNA polymerase II has been guided onto the promoter DNA to form a transcription initiation complex, it must gain access to the template strand at the transcription start point. This step is aided by one of the general transcription factors, TFIIH, which contains a DNA helicase. Next, like the bacterial polymerase, polymerase II remains at the promoter, synthesizing short lengths of RNA until it undergoes a conformational change and is released to begin transcribing a gene. A key step in this release is the addition of phosphate groups to the “tail” of the RNA polymerase (known as the CTD or C-terminal domain). This phosphorylation is also catalyzed by TFIIH, which, in addition to a helicase, contains a protein kinase as one of its subunits (see Figure 6-16, D and E). The polymerase can then disengage from the cluster of general transcription factors, undergoing a series of conformational changes that tighten its interaction with DNA and acquiring new proteins that allow it to transcribe for long distances without dissociating.

Once the polymerase II has begun elongating the RNA transcript, most of the general transcription factors are released from the DNA so that they are available to initiate another round of transcription with a new RNA polymerase molecule. As we see shortly, the phosphorylation of the tail of RNA polymerase II also causes components of the RNA processing machinery to load onto the polymerase and thus be in position to modify the newly transcribed RNA as it emerges from the polymerase.

The model for transcription initiation just described was established by studying the action of RNA polymerase II and its general transcription factors on purified DNA templates in vitro. However, as discussed in Chapter 4, DNA in eucaryotic cells is packaged into nucleosomes, which are further arranged in higher-order chromatin structures. As a result, transcription initiation in a eucaryotic cell is more complex and requires more proteins than it does on purified DNA. First, gene regulatory proteins known as transcriptional activators bind to specific sequences in DNA and help to attract RNA polymerase II to the start point of transcription (Figure 6-19). This attraction is needed to help the RNA polymerase and the general transcription factors in overcoming the difficulty of binding to DNA that is packaged in chromatin. We discuss the role of activators in Chapter 7, because they represent one of the main ways in which cells regulate expression of their genes. Here we simply note that their presence on DNA is required for transcription initiation in a eucaryotic cell. Second, eucaryotic transcription initiation in vivo requires the presence of a protein complex known as the mediator, which allows the activator proteins to communicate properly with the polymerase II and with the general transcription factors. Finally, transcription initiation in the cell often requires the local recruitment of chromatin-modifying enzymes, including chromatin remodeling complexes and histone acetylases (see Figure 6-19). As discussed in Chapter 4, both types of enzymes can allow greater accessibility to the DNA present in chromatin, and by doing so, they facilitate the assembly of the transcription initiation machinery onto DNA.

As illustrated in Figure 6-19, many proteins (well over one hundred individual subunits) must assemble at the start point of transcription to initiate transcription in a eucaryotic cell. The order of assembly of these proteins is probably different for different genes and therefore may not follow a prescribed pathway. In fact, some of these different protein assemblies may interact with each other away from the DNA and be brought to DNA as preformed subcomplexes. For example, the mediator, RNA polymerase II, and some of the general transcription factors can bind to each other in the nucleoplasm and be brought to the DNA as a unit. We return to this issue in Chapter 7, where we discuss the many ways eucaryotic cells can regulate the process of transcription initiation.

Once it has initiated transcription, RNA polymerase does not proceed smoothly along a DNA molecule; rather it moves jerkily, pausing at some sequences and rapidly transcribing through others. Elongating RNA polymerases, both bacterial and eucaryotic, are associated with a series of elongation factors, proteins that decrease the likelihood that RNA polymerase will dissociate before it reaches the end of a gene. These factors typically associate with RNA polymerase shortly after initiation has occurred and help polymerases to move through the wide variety of different DNA sequences that are found in genes. Eucaryotic RNA polymerases must also contend with chromatin structure as they move along a DNA template. Experiments have shown that bacterial polymerases, which never encounter nucleosomes in vivo, can nonetheless transcribe through them in vitro, suggesting that a nucleosome is easily traversed. However, eucaryotic polymerases have to move through forms of chromatin that are more compact than a simple nucleosome. It therefore seems likely that they transcribe with the aid of chromatin remodeling complexes (see pp. 212–213). These complexes may move with the polymerase or may simply seek out and rescue the occasional stalled polymerase. In addition, some elongation factors associated with eucaryotic RNA polymerase facilitate transcription through nucleosomes without requiring additional energy. It is not yet understood how this is accomplished, but these proteins may help to dislodge parts of the nucleosome core as the polymerase transcribes the DNA of a nucleosome.

There is yet another barrier to elongating polymerases, both bacterial and eucaryotic. To discuss this issue, we need first to consider a subtle property inherent in the DNA double helix called DNA supercoiling. DNA supercoiling represents a conformation that DNA will adopt in response to superhelical tension; conversely, creating various loops or coils in the helix can create such tension. A simple way of visualizing the topological constraints that cause DNA supercoiling is illustrated in Figure 6-20A. There are approximately 10 nucleotide pairs for every helical turn in a DNA double helix. Imagine a helix whose two ends are fixed with respect to each other (as they are in a DNA circle, such as a bacterial chromosome, or in a tightly clamped loop, as is thought to exist in eucaryotic chromosomes). In this case, one large DNA supercoil will form to compensate for each 10 nucleotide pairs that are opened (unwound). The formation of this supercoil is energetically favorable because it restores a normal helical twist to the base-paired regions that remain, which would otherwise need to be overwound because of the fixed ends.

Superhelical tension is also created as RNA polymerase moves along a stretch of DNA that is anchored at its ends (Figure 6-20C). As long as the polymerase is not free to rotate rapidly (and such rotation is unlikely given the size of RNA polymerases and their attached transcripts), a moving polymerase generates positive superhelical tension in the DNA in front of it and negative helical tension behind it. For eucaryotes, this situation is thought to provide a bonus: the positive superhelical tension ahead of the polymerase makes the DNA helix more difficult to open, but this tension should facilitate the unwrapping of DNA in nucleosomes, as the release of DNA from the histone core helps to relax positive superhelical tension.

Any protein that propels itself alone along a DNA strand of a double helix tends to generate superhelical tension. In eucaryotes, DNA topoisomerase enzymes rapidly remove this superhelical tension (see p. 251). But, in bacteria, a specialized topoisomerase called DNA gyrase uses the energy of ATP hydrolysis to pump supercoils continuously into the DNA, thereby maintaining the DNA under constant tension. These are negative supercoils, having the opposite handedness from the positive supercoils that form when a region of DNA helix opens (see Figure 6-20B). These negative supercoils are removed from bacterial DNA whenever a region of helix opens, reducing the superhelical tension. DNA gyrase therefore makes the opening of the DNA helix in bacteria energetically favorable compared with helix opening in DNA that is not supercoiled. For this reason, it usually facilitates those genetic processes in bacteria, including the initiation of transcription by bacterial RNA polymerase, that require helix opening (see Figure 6-10).

We have seen that bacterial mRNAs are synthesized solely by the RNA polymerase starting and stopping at specific spots on the genome. The situation in eucaryotes is substantially different. In particular, transcription is only the first step in a series of reactions that includes the covalent modification of both ends of the RNA and the removal of intron sequences that are discarded from the middle of the RNA transcript by the process of RNA splicing (Figure 6-21). The modifications of the ends of eucaryotic mRNA are capping on the 5′ end and polyadenylation of the 3′ end (Figure 6-22). These special ends allow the cell to assess whether both ends of an mRNA molecule are present (and the message is therefore intact) before it exports the RNA sequence from the nucleus for translation into protein. In Chapter 4, we saw that a typical eucaryotic gene is present in the genome as short blocks of protein-coding sequence (exons) separated by long introns, and RNA splicing is the critically important step in which the different portions of a protein coding sequence are joined together. As we describe next, RNA splicing also provides higher eucaryotes with the ability to synthesize several different proteins from the same gene.

These RNA processing steps are tightly coupled to transcription elongation by an ingenious mechanism. As discussed previously, a key step of the transition of RNA polymerase II to the elongation mode of RNA synthesis is an extensive phosphorylation of the RNA polymerase II tail, called the CTD. This C-terminal domain of the largest subunit consists of a long tandem array of a repeated seven-amino-acid sequence, containing two serines per repeat that can be phosphorylated. Because there are 52 repeats in the CTD of human RNA polymerase II, its complete phosphorylation would add 104 negatively charged phosphate groups to the polymerase. This phosphorylation step not only dissociates the RNA polymerase II from other proteins present at the start point of transcription, it also allows a new set of proteins to associate with the RNA polymerase tail that function in transcription elongation and pre-mRNA processing. As discussed next, some of these processing proteins seem to “hop” from the polymerase tail onto the nascent RNA molecule to begin processing it as it emerges from the RNA polymerase. Thus, RNA polymerase II in its elongation mode can be viewed as an RNA factory that both transcribes DNA into RNA and processes the RNA it produces (Figure 6-23).

As soon as RNA polymerase II has produced about 25 nucleotides of RNA, the 5′ end of the new RNA molecule is modified by addition of a “cap” that consists of a modified guanine nucleotide (see Figure 6-22B). The capping reaction is performed by three enzymes acting in succession: one (a phosphatase) removes one phosphate from the 5′ end of the nascent RNA, another (a guanyl transferase) adds a GMP in a reverse linkage (5′ to 5′ instead of 5′ to 3′), and a third (a methyl transferase) adds a methyl group to the guanosine (Figure 6-24). Because all three enzymes bind to the phosphorylated RNA polymerase tail, they are poised to modify the 5′ end of the nascent transcript as soon as it emerges from the polymerase.

The 5′-methyl cap signals the 5′ end of eucaryotic mRNAs, and this landmark helps the cell to distinguish mRNAs from the other types of RNA molecules present in the cell. For example, RNA polymerases I and III produce uncapped RNAs during transcription, in part because these polymerases lack tails. In the nucleus, the cap binds a protein complex called CBC (cap-binding complex), which, as we discuss in subsequent sections, helps the RNA to be properly processed and exported. The 5′ methyl cap also has an important role in the translation of mRNAs in the cytosol as we discuss later in the chapter.

As discussed in Chapter 4, the protein coding sequences of eucaryotic genes are typically interrupted by noncoding intervening sequences (introns). Discovered in 1977, this feature of eucaryotic genes came as a surprise to scientists, who had been, until that time, familiar only with bacterial genes, which typically consist of a continuous stretch of coding DNA that is directly transcribed into mRNA. In marked contrast, eucaryotic genes were found to be broken up into small pieces of coding sequence (expressed sequences or exons) interspersed with much longer intervening sequences or introns; thus the coding portion of a eucaryotic gene is often only a small fraction of the length of the gene (Figure 6-25).

Both intron and exon sequences are transcribed into RNA. The intron sequences are removed from the newly synthesized RNA through the process of RNA splicing. The vast majority of RNA splicing that takes place in cells functions in the production of mRNA, and our discussion of splicing focuses on this type. It is termed precursor-mRNA (or pre-mRNA) splicing to denote that it occurs on RNA molecules destined to become mRNAs. Only after 5′ and 3′ end processing and splicing have taken place is such RNA termed mRNA.

Each splicing event removes one intron, proceeding through two sequential phosphoryl-transfer reactions known as transesterifications; these join two exons while removing the intron as a “lariat” (Figure 6-26). Since the number of phosphate bonds remains the same, these reactions could in principle take place without nucleoside triphosphate hydrolysis. However, the machinery that catalyzes pre-mRNA splicing is complex, consisting of 5 additional RNA molecules and over 50 proteins, and it hydrolyzes many ATP molecules per splicing event. This complexity is presumably needed to ensure that splicing is highly accurate, while also being sufficiently flexible to deal with the enormous variety of introns found in a typical eucaryotic cell. Frequent mistakes in RNA splicing would severely harm the cell, as they would result in malfunctioning proteins. We see in Chapter 7 that when rare splicing mistakes do occur, the cell has a “fail-safe” device to eliminate the incorrectly spliced mRNAs.

It may seem wasteful to remove large numbers of introns by RNA splicing. In attempting to explain why it occurs, scientists have pointed out that the exon-intron arrangement would seem to facilitate the emergence of new and useful proteins. Thus, the presence of numerous introns in DNA allows genetic recombination to readily combine the exons of different genes (see p. 462), allowing genes for new proteins to evolve more easily by the combination of parts of preexisting genes. This idea is supported by the observation, described in Chapter 3, that many proteins in present-day cells resemble patchworks composed from a common set of protein pieces, called protein domains.

RNA splicing also has a present-day advantage. The transcripts of many eucaryotic genes (estimated at 60% of genes in humans) are spliced in a variety of different ways to produce a set of different mRNAs, thereby allowing a corresponding set of different proteins to be produced from the same gene (Figure 6-27). We discuss additional examples of alternative splicing in Chapter 7, as this is also one of the mechanisms that cells use to change expression of their genes. Rather than being the wasteful process it may have seemed at first sight, RNA splicing enables eucaryotes to increase the already enormous coding potential of their genomes. We shall return to this idea several times in this chapter and the next, but we first need to describe the cellular machinery that performs this remarkable task.

Introns range in size from about 10 nucleotides to over 100,000 nucleotides. Picking out the precise borders of an intron is very difficult for scientists to do (even with the aid of computers) when confronted by a complete genome sequence of a eucaryote. The possibility of alternative splicing compounds the problem of predicting protein sequences solely from a genome sequence. This difficulty constitutes one of the main barriers to identifying all of the genes in a complete genome sequence, and it is the primary reason that we know only the approximate number of genes in, for example, the human genome. Yet each cell in our body recognizes and rapidly excises the appropriate intron sequences with high fidelity. We have seen that intron sequence removal involves three positions on the RNA: the 5′ splice site, the 3′ splice site, and the branch point in the intron sequence that forms the base of the excised lariat. In pre-mRNA splicing, each of these three sites has a consensus nucleotide sequence that is similar from intron to intron, providing the cell with cues on where splicing is to take place (Figure 6-28). However, there is enough variation in each sequence to make it very difficult for scientists to pick out all of the many splicing signals in a genome sequence.

Unlike the other steps of mRNA production we have discussed, RNA splicing is performed largely by RNA molecules instead of proteins. RNA molecules recognize intron-exon borders and participate in the chemistry of splicing. These RNA molecules are relatively short (less than 200 nucleotides each), and there are five of them (U1, U2, U4, U5, and U6) involved in the major form of pre-mRNA splicing. Known as snRNAs (small nuclear RNAs), each is complexed with at least seven protein subunits to form a snRNP (small nuclear ribonucleoprotein). These snRNPs form the core of the spliceosome, the large assembly of RNA and protein molecules that performs pre-mRNA splicing in the cell.

The spliceosome is a dynamic machine; as we see below, it is assembled on pre-mRNA from separate components, and parts enter and leave it as the splicing reaction proceeds (Figure 6-29). During the splicing reaction, recognition of the 5′ splice junction, the branch point site and the 3′ splice junction is performed largely through base-pairing between the snRNAs and the consensus RNA sequences in the pre-mRNA substrate (Figure 6-30). In the course of splicing, the spliceosome undergoes several shifts in which one set of base-pair interactions is broken and another is formed in its place. For example, U1 is replaced by U6 at the 5′ splice junction (see Figure 6-30A). As we shall see, this type of RNA-RNA rearrangement (in which the formation of one RNA-RNA interaction requires the disruption of another) occurs several times during the splicing reaction. It permits the checking and rechecking of RNA sequences before the chemical reaction is allowed to proceed, thereby increasing the accuracy of splicing.

Although ATP hydrolysis is not required for the chemistry of RNA splicing per se, it is required for the stepwise assembly and rearrangements of the spliceosome. Some of the additional proteins that make up the spliceosome are RNA helicases, which use the energy of ATP hydrolysis to break existing RNA-RNA interactions so as to allow the formation of new ones. In fact, all the steps shown previously in Figure 6-29—except the association of BBP with the branch-point site and U1 snRNP with the 5′ splice site—require ATP hydrolysis and additional proteins. In all, more than 50 proteins, including those that form the snRNPs, are required for each splicing event.

The ATP-requiring RNA-RNA rearrangements that take place in the spliceosome occur within the snRNPs themselves and between the snRNPs and the pre-mRNA substrate. One of the most important roles of these rearrangements is the creation of the active catalytic site of the spliceosome. The strategy of creating an active site only after the assembly and rearrangement of splicing components on a pre-mRNA substrate is an important way of preventing wayward splicing.

Perhaps the most surprising feature of the spliceosome is the nature of the catalytic site itself: it is largely (if not exclusively) formed by RNA molecules instead of proteins. In the last section of this chapter we discuss in general terms the structural and chemical properties of RNA that allow it to perform catalysis; here we need only consider that the U2 and U6 snRNAs in the spliceosome form a precise three-dimensional RNA structure that juxtaposes the 5′ splice site of the pre-mRNA with the branch-point site and probably performs the first transesterification reaction (see Figure 6-30C). In a similar way, the 5′ and 3′ splice junctions are brought together (an event requiring the U5 snRNA) to facilitate the second transesterification.

Once the splicing chemistry is completed, the snRNPs remain bound to the lariat and the spliced product is released. The disassembly of these snRNPs from the lariat (and from each other) requires another series of RNA-RNA rearrangements that require ATP hydrolysis, thereby returning the snRNAs to their original configuration so that they can be used again in a new reaction.

As we have seen, intron sequences vary enormously in size, with some being in excess of 100,000 nucleotides. If splice-site selection were determined solely by the snRNPs acting on a preformed, protein-free RNA molecule, we would expect splicing mistakes—such as exon skipping and the use of cryptic splice sites—to be very common (Figure 6-31).

The fidelity mechanisms built into the spliceosome are supplemented by two additional factors that help ensure that splicing occurs accurately. These ordering influences in the pre-mRNA increase the probability that the appropriate pairs of 5′ and 3′ splice sites will be brought together in the spliceosome before the splicing chemistry begins. The first results from the assembly of the spliceosome occurring as the pre-mRNA emerges from a transcribing RNA polymerase II (see Figure 6-23). As for 5′ cap formation, several components of the spliceosome seem to be carried on the phosphorylated tail of RNA polymerase. Their transfer directly from the polymerase to the nascent pre-mRNA presumably helps the cell to keep track of introns and exons: the snRNPs at a 5′ splice site are initially presented with only a single 3′ splice site since the sites further downstream have not yet been synthesized. This feature helps to prevent inappropriate exon skipping.

The second factor that helps the cell to choose splice sites has been termed the “exon definition hypothesis,” and it is understood only in outline. Exon size tends to be much more uniform than intron size, averaging about 150 nucleotide pairs across a wide variety of eucaryotic organisms (Figure 6-32). As RNA synthesis proceeds, a group of spliceosome components, called the SR proteins (so-named because they contain a domain rich in serines and arginines), are thought to assemble on exon sequences and mark off each 3′ and 5′ splice site starting at the 5′ end of the RNA (Figure 6-33). This assembly takes place in conjunction with the U1 snRNA, which marks one exon boundary, and U2AF, which initially helps to specify the other. By specifically marking the exons in this way, the cell increases the accuracy with which the initial splicing components are deposited on the nascent RNA and thereby helps to avoid cryptic splice sites. How the SR proteins discriminate exon sequences from intron sequences is not understood; however, it is known that some of the SR proteins bind preferentially to RNA sequences in specific exons. In principle, the redundancy in the genetic code could have been exploited during evolution to select for binding sites for SR proteins in exons, allowing these sites to be created without constraining amino acid sequences.

Both the marking out of exon and intron boundaries and the assembly of the spliceosome begin on an RNA molecule while it is still being elongated by RNA polymerase at its 3′ end. However, the actual chemistry of splicing can take place much later. This delay means that intron sequences are not necessarily removed from a pre-mRNA molecule in the order in which they occur along the RNA chain. It also means that, although spliceosome assembly is co-transcriptional, the splicing reactions sometimes occur posttranscriptionally—that is, after a complete pre-mRNA molecule has been made.

Simple eucaryotes such as yeast have only one set of snRNPs that perform all pre-mRNA splicing. However, more complex eucaryotes such as flies, mammals, and plants have a second set of snRNPs that direct the splicing of a small fraction of their intron sequences. This minor form of spliceosome recognizes a different set of DNA sequences at the 5′ and 3′ splice junctions and at the branch point; it is called the AT-AC spliceosome because of the nucleotide sequence determinants at its intron-exon borders (Figure 6-34). Despite recognizing different nucleotide sequences, the snRNPs in this spliceosome make the same types of RNA-RNA interactions with the pre-mRNA and with each other as do the major snRNPs (Figure 6-34B). The recent discovery of this class of snRNPs gives us confidence in the base-pair interactions deduced for the major spliceosome, because it provides an independent set of molecules that undergo the same RNA-RNA interactions despite differences in the RNA sequences involved.

A particular variation on splicing, called trans-splicing, has been discovered in a few eucaryotic organisms. These include the single-celled trypanosomes—protozoans that cause African sleeping sickness in humans—and the model multicellular organism, the nematode worm. In trans-splicing, exons from two separate RNA transcripts are spliced together to form a mature mRNA molecule (see Figure 6-34). Trypanosomes produce all of their mRNAs in this way, whereas only about 1% of nematode mRNAs are produced by trans-splicing. In both cases, a single exon is spliced onto the 5′ end of many different RNA transcripts produced by the cell; in this way, all of the products of trans-splicing have the same 5′ exon and different 3′ exons. Many of the same snRNPs that function in conventional splicing are used in this reaction, although trans-splicing uses a unique snRNP (called the SL RNP) that brings in the common exon (see Figure 6-34).

The reason that a few organisms use trans-splicing is not known; however, it is thought that the common 5′ exon may aid in the translation of the mRNA. Thus, the products of trans-splicing in nematodes seem to be translated with especially high efficiency.

We have seen that the choice of splice sites depends on many features of the pre-mRNA transcript; these include the affinity of the three signals on the RNA (the 5′ and 3′ splice junctions and branch point) for the splicing machinery, the length and nucleotide sequence of the exon, the co-transcriptional assembly of the spliceosome, and the accuracy of the “bookkeeping” that underlies exon definition. So far we have emphasized the accuracy of the RNA splicing processes that occur in a cell. But it also seems that the mechanism has been selected for its flexibility, which allows the cell to try out new proteins on occasion. Thus, for example, when a mutation occurs in a nucleotide sequence critical for splicing of a particular intron, it does not necessarily prevent splicing of that intron altogether. Instead, the mutation typically creates a new pattern of splicing (Figure 6-35). Most commonly, an exon is simply skipped (Figure 6-35B). In other cases, the mutation causes a “cryptic” splice junction to be used (Figure 6-35C). Presumably, the splicing machinery has evolved to pick out the best possible pattern of splice junctions, and if the optimal one is damaged by mutation, it will seek out the next best pattern and so on. This flexibility in the process of RNA splicing suggests that changes in splicing patterns caused by random mutations have been an important pathway in the evolution of genes and organisms.

The plasticity of RNA splicing also means that the cell can easily regulate the pattern of RNA splicing. Earlier in this section we saw that alternative splicing can give rise to different proteins from the same gene. Some examples of alternative splicing are constitutive; that is, the alternatively spliced mRNAs are produced continuously by cells of an organism. However, in most cases, the splicing patterns are regulated by the cell so that different forms of the protein are produced at different times and in different tissues (see Figure 6-27). In Chapter 7 we return to this issue to discuss some specific examples of regulated RNA splicing.

When the spliceosome was first discovered, it puzzled molecular biologists. Why do RNA molecules instead of proteins perform important roles in splice site recognition and in the chemistry of splicing? Why is a lariat intermediate used rather than the apparently simpler alternative of bringing the 5′ and 3′ splice sites together in a single step, followed by their direct cleavage and rejoining? The answers to these questions reflect the way in which the spliceosome is believed to have evolved.

As discussed briefly in Chapter 1 (and taken up again in more detail in the final section of this chapter), it is thought that early cells used RNA molecules rather than proteins as their major catalysts and that they stored their genetic information in RNA rather than in DNA sequences. RNA-catalyzed splicing reactions presumably had important roles in these early cells. As evidence, some self-splicing RNA introns (that is, intron sequences in RNA whose splicing out can occur in the absence of proteins or any other RNA molecules) remain today—for example, in the nuclear rRNA genes of the ciliate Tetrahymena, in a few bacteriophage T4 genes, and in some mitochondrial and chloroplast genes.

A self-splicing intron sequence can be identified in a test tube by incubating a pure RNA molecule that contains the intron sequence and observing the splicing reaction. Two major classes of self-splicing intron sequences can be distinguished in this way. Group I intron sequences begin the splicing reaction by binding a G nucleotide to the intron sequence; this G is thereby activated to form the attacking group that will break the first of the phosphodiester bonds cleaved during splicing (the bond at the 5′ splice site). In group II intron sequences, an especially reactive A residue in the intron sequence is the attacking group, and a lariat intermediate is generated. Otherwise the reaction pathways for the two types of self-splicing intron sequences are the same. Both are presumed to represent vestiges of very ancient mechanisms (Figure 6-36).

For both types of self-splicing reactions, the nucleotide sequence of the intron is critical; the intron RNA folds into a specific three-dimensional structure, which brings the 5′ and 3′ splice junctions together and provides precisely positioned reactive groups to perform the chemistry (see Figure 6-6C). Based on the fact that the chemistries of their splicing reactions are so similar, it has been proposed that the pre-mRNA splicing mechanism of the spliceosome evolved from group II splicing. According to this idea, when the spliceosomal snRNPs took over the structural and chemical roles of the group II introns, the strict sequence constraints on intron sequences would have disappeared, thereby permitting a vast expansion in the number of different RNAs that could be spliced.

As previously explained, the 5′ end of the pre-mRNA produced by RNA polymerase II is capped almost as soon as it emerges from the RNA polymerase. Then, as the polymerase continues its movement along a gene, the spliceosome components assemble on the RNA and delineate the intron and exon boundaries. The long C-terminal tail of the RNA polymerase coordinates these processes by transferring capping and splicing components directly to the RNA as the RNA emerges from the enzyme. As we see in this section, as RNA polymerase II terminates transcription at the end of a gene, it uses a similar mechanism to ensure that the 3′ end of the pre-mRNA becomes appropriately processed.

As might be expected, the 3′ ends of mRNAs are ultimately specified by DNA signals encoded in the genome (Figure 6-37). These DNA signals are transcribed into RNA as the RNA polymerase II moves through them, and they are then recognized (as RNA) by a series of RNA-binding proteins and RNA-processing enzymes (Figure 6-38). Two multisubunit proteins, called CstF (cleavage stimulation factor F) and CPSF (cleavage and polyadenylation specificity factor), are of special importance. Both of these proteins travel with the RNA polymerase tail and are transferred to the 3′ end processing sequence on an RNA molecule as it emerges from the RNA polymerase. Some of the subunits of CPSF are associated with the general transcription factor TFIID, which, as we saw earlier in this chapter, is involved in transcription initiation. During transcription initiation, these subunits may be transferred from TFIID to the RNA polymerase tail, remaining associated there until the polymerase has transcribed through the end of a gene.

Once CstF and CPSF bind to specific nucleotide sequences on an emerging RNA molecule, additional proteins assemble with them to perform the processing that creates the 3′ end of the mRNA. First, the RNA is cleaved (see Figure 6-38). Next an enzyme called poly-A polymerase adds, one at a time, approximately 200 A nucleotides to the 3′ end produced by the cleavage. The nucleotide precursor for these additions is ATP, and the same type of 5′-to-3′ bonds are formed as in conventional RNA synthesis (see Figure 6-4). Unlike the usual RNA polymerases, poly-A polymerase does not require a template; hence the poly-A tail of eucaryotic mRNAs is not directly encoded in the genome. As the poly-A tail is synthesized, proteins called poly-A-binding proteins assemble onto it and, by a poorly understood mechanism, determine the final length of the tail. Poly-A-binding proteins remain bound to the poly-A tail as the mRNA makes its journey from the nucleus to the cytosol and they help to direct the synthesis of a protein on the ribosome, as we see later in this chapter.

After the 3′ end of a eucaryotic pre-mRNA molecule has been cleaved, the RNA polymerase II continues to transcribe, in some cases continuing as many as several hundred nucleotides beyond the DNA that contains the 3′ cleavage-site information. But the polymerase soon releases its grip on the template and transcription terminates; the piece of RNA downstream of the cleavage site is then degraded in the cell nucleus. It is not yet understood what triggers the loss in polymerase II processivity after the RNA is cleaved. One idea is that the transfer of the 3′ end processing factors from the RNA polymerase to the RNA causes a conformational change in the polymerase that loosens its hold on DNA; another is that the lack of a cap structure (and the CBC) on the 5′ end of the RNA that emerges from the polymerase somehow signals to the polymerase to terminate transcription.

We have seen how eucaryotic pre-mRNA synthesis and processing takes place in an orderly fashion within the cell nucleus. However, these events create a special problem for eucaryotic cells, especially those of complex organisms where the introns are vastly longer than the exons. Of the pre-mRNA that is synthesized, only a small fraction—the mature mRNA—is of further use to the cell. The rest—excised introns, broken RNAs, and aberrantly spliced pre-mRNAs—is not only useless but could be dangerous if it was not destroyed. How then does the cell distinguish between the relatively rare mature mRNA molecules it wishes to keep and the overwhelming amount of debris from RNA processing? The answer is that transport of mRNA from the nucleus to the cytoplasm, where it is translated into protein, is highly selective—being closely coupled to correct RNA processing. This coupling is achieved by the nuclear pore complex, which recognizes and transports only completed mRNAs.

We have seen that as a pre-mRNA molecule is synthesized and processed, it is bound by a variety of proteins, including the cap-binding complex, the SR proteins, and the poly-A binding proteins. To be “export-ready,” it seems than an mRNA must be bound by the appropriate set of proteins—with certain proteins such as the cap-binding complex being present, and others such as snRNP proteins absent. Additional proteins, placed on the RNA during splicing, seem to mark exon-exon boundaries and thereby signify completed splicing events. Only if the proper set of proteins is bound to an mRNA is it guided through the nuclear pore complex into the cytosol. As described in Chapter 12, nuclear pore complexes are aqueous channels in the nuclear membrane that directly connect the nucleoplasm and cytosol. Small molecules (less than 50,000 daltons) can diffuse freely through them. However, most of the macromolecules in cells, including mRNAs complexed with proteins, are far too large to pass through the pores without a special process to move them. An active transport of substances through the nuclear pore complexes occurs in both directions. As explained in Chapter 12, signals on the macromolecule determine whether it is exported from the nucleus (a mRNA, for example) or imported into it (an RNA polymerase, for example). For the case of mRNAs, the bound proteins that mark completed splicing events are of particular importance, as they are known to serve directly as RNA export factors (see Figure 12-16). mRNAs transcribed from genes that lack introns apparently contain nucleotide sequences that are directly recognized by other RNA export factors. Eucaryotic cells thus use their nuclear pore complexes as gates that allow only useful RNA molecules to enter the cytoplasm.

Of all the proteins that assemble on pre-mRNA molecules as they emerge from transcribing RNA polymerases, the most abundant are the hnRNPs (heterogeneous nuclear ribonuclear proteins). Some of these proteins (there are approximately 30 of them in humans) remove the hairpin helices from the RNA so that splicing and other signals on the RNA can be read more easily. Others package the RNA contained in the very long intron sequences typically found in genes of complex organisms (see Figure 6-33). Apart from histones, certain hnRNP proteins are the most abundant proteins in the cell nucleus, and they may play a particularly important role in distinguishing mature mRNA from processing debris. hnRNP particles (nucleosome-like complexes of hnRNP proteins and RNA—see Figure 6-33) are largely excluded from exon sequences, perhaps by prior binding of spliceosome components. They remain on excised introns and probably help mark them for nuclear retention and eventual destruction.

The export of mRNA-protein complexes from the nucleus can be observed with an electron microscope for the unusually abundant mRNA of the insect Balbiani Ring genes. As these genes are transcribed, the newly formed RNA is seen to be packaged by proteins (including hnRNP and SR proteins). This protein-RNA complex undergoes a series of structural transitions, probably reflecting RNA processing events, culminating in a curved fiber (Figure 6-39). This curved fiber then moves through the nucleoplasm and enters the nuclear pore complex (with its 5′ cap proceeding first), and it undergoes another series of structural transitions as it moves through the NPC. These and other observations reveal that the pre-mRNA-protein and mRNA-protein complexes are dynamic structures that gain and lose numerous specific proteins during RNA synthesis, processing, export, and translation (Figure 6-40).

Before discussing what happens to mRNAs after they leave the nucleus, we briefly consider how the synthesis and processing of noncoding RNA molecules occurs. Although there are many other examples, our discussion focuses on the rRNAs that are critically important for the translation of mRNAs into protein.

A few per cent of the dry weight of a mammalian cell is RNA; of that, only about 3–5% is mRNA. A fraction of the remainder represents intron sequences before they have been degraded, but most of the RNA in cells performs structural and catalytic functions (see Table 6-1, p. 306). The most abundant RNAs in cells are the ribosomal RNAs (rRNAs)—constituting approximately 80% of the RNA in rapidly dividing cells. As discussed later in this chapter, these RNAs form the core of the ribosome. Unlike bacteria—in which all RNAs in the cell are synthesized by a single RNA polymerase—eucaryotes have a separate, specialized polymerase, RNA polymerase I, that is dedicated to producing rRNAs. RNA polymerase I is similar structurally to the RNA polymerase II discussed previously; however, the absence of a C-terminal tail in polymerase I helps to explain why its transcripts are neither capped nor polyadenylated. As discussed earlier, this difference helps the cell distinguish between noncoding RNAs and mRNAs.

Because multiple rounds of translation of each mRNA molecule can provide an enormous amplification in the production of protein molecules, many of the proteins that are very abundant in a cell can be synthesized from genes that are present in a single copy per haploid genome. In contrast, the RNA components of the ribosome are final gene products, and a growing mammalian cell must synthesize approximately 10 million copies of each type of ribosomal RNA in each cell generation to construct its 10 million ribosomes. Adequate quantities of ribosomal RNAs can be produced only because the cell contains multiple copies of the rRNA genes that code for ribosomal RNAs (rRNAs). Even E. coli needs seven copies of its rRNA genes to meet the cell's need for ribosomes. Human cells contain about 200 rRNA gene copies per haploid genome, spread out in small clusters on five different chromosomes (see Figure 4-11), while cells of the frog Xenopus contain about 600 rRNA gene copies per haploid genome in a single cluster on one chromosome (Figure 6-41).

There are four types of eucaryotic rRNAs, each present in one copy per ribosome. Three of the four rRNAs (18S, 5.8S, and 28S) are made by chemically modifying and cleaving a single large precursor rRNA (Figure 6-42); the fourth (5S RNA) is synthesized from a separate cluster of genes by a different polymerase, RNA polymerase III, and does not require chemical modification. It is not known why this one RNA is transcribed separately.

Extensive chemical modifications occur in the 13,000-nucleotide-long precursor rRNA before the rRNAs are cleaved out of it and assembled into ribosomes. These include about 100 methylations of the 2′-OH positions on nucleotide sugars and 100 isomerizations of uridine nucleotides to pseudouridine (Figure 6-43A). The functions of these modifications are not understood in detail, but they probably aid in the folding and assembly of the final rRNAs and may also subtly alter the function of ribosomes. Each modification is made at a specific position in the precursor rRNA. These positions are specified by several hundred “guide RNAs,” which locate themselves through base-pairing to the precursor rRNA and thereby bring an RNA-modifying enzyme to the appropriate position (Figure 6-43B). Other guide RNAs promote cleavage of the precursor rRNAs into the mature rRNAs, probably by causing conformational changes in the precursor rRNA. All of these guide RNAs are members of a large class of RNAs called small nucleolar RNAs (or snoRNAs), so named because these RNAs perform their functions in a subcompartment of the nucleus called the nucleolus. Many snoRNAs are encoded in the introns of other genes, especially those encoding ribosomal proteins. They are therefore synthesized by RNA polymerase II and processed from excised intron sequences.

The nucleolus is the most obvious structure seen in the nucleus of a eucaryotic cell when viewed in the light microscope. Consequently, it was so closely scrutinized by early cytologists that an 1898 review could list some 700 references. We now know that the nucleolus is the site for the processing of rRNAs and their assembly into ribosomes. Unlike other organelles in the cell, it is not bound by a membrane (Figure 6-44); instead, it is a large aggregate of macromolecules, including the rRNA genes themselves, precursor rRNAs, mature rRNAs, rRNA-processing enzymes, snoRNPs, ribosomal protein subunits and partly assembled ribosomes. The close association of all these components presumably allows the assembly of ribosomes to occur rapidly and smoothly.

It is not yet understood how the nucleolus is held together and organized, but various types of RNA molecules play a central part in its chemistry and structure, suggesting that the nucleolus may have evolved from an ancient structure present in cells dominated by RNA catalysis. In present-day cells, the rRNA genes also have an important role in forming the nucleolus. In a diploid human cell, the rRNA genes are distributed into 10 clusters, each of which is located near the tip of one of the two copies of five different chromosomes (see Figure 4-11). Each time a human cell undergoes mitosis, the chromosomes disperse and the nucleolus breaks up; after mitosis, the tips of the 10 chromosomes coalesce as the nucleolus reforms (Figures 6-45 and 6-46). The transcription of the rRNA genes by RNA polymerase I is necessary for this process.

As might be expected, the size of the nucleolus reflects the number of ribosomes that the cell is producing. Its size therefore varies greatly in different cells and can change in a single cell, occupying 25% of the total nuclear volume in cells that are making unusually large amounts of protein.

A schematic diagram of the assembly of ribosomes is shown in Figure 6-47. In addition to its important role in ribosome biogenesis, the nucleolus is also the site where other RNAs are produced and other RNA-protein complexes are assembled. For example, the U6 snRNP, which, as we have seen, functions in pre-mRNA splicing (see Figure 6-29), is composed of one RNA molecule and at least seven proteins. The U6 snRNA is chemically modified by snoRNAs in the nucleolus before its final assembly there into the U6 snRNP. Other important RNA protein complexes, including telomerase (encountered in Chapter 5) and the signal recognition particle (which we discuss in Chapter 12), are also believed to be assembled at the nucleolus. Finally, the tRNAs (transfer RNAs) that carry the amino acids for protein synthesis are processed there as well. Thus, the nucleolus can be thought of as a large factory at which many different noncoding RNAs are processed and assembled with proteins to form a large variety of ribonucleoprotein complexes.

Although the nucleolus is the most prominent structure in the nucleus, several other nuclear bodies have been visualized and studied (Figure 6-48). These include Cajal bodies (named for the scientist who first described them in 1906), GEMS (Gemini of coiled bodies), and interchromatin granule clusters (also called “speckles”). Like the nucleolus, these other nuclear structures lack membranes and are highly dynamic; their appearance is probably the result of the tight association of protein and RNA (and perhaps DNA) components involved in the synthesis, assembly, and storage of macromolecules involved in gene expression. Cajal bodies and GEMS resemble one another and are frequently paired in the nucleus; it is not clear whether they are truly distinct structures. They may be sites where snRNAs and snoRNAs undergo their final modifications and assembly with protein. Both the RNAs and the proteins that make up the snRNPs are partly assembled in the cytoplasm, but they are transported into the nucleus for their final modifications. It has been proposed that Cajal bodies/GEMS are also sites where the snRNPs are recycled and their RNAs are “reset” after the rearrangements that occur during splicing (see p. 322). In contrast, the interchromatin granule clusters have been proposed to be stockpiles of fully mature snRNPs that are ready to be used in splicing of pre-mRNAs (Figure 6-49).

Scientists have had difficulties in working out the function of the small subnuclear structures just described. Much of the progress now being made depends on genetic tools—examination of the effects of designed mutations in mice or of spontaneous mutations in humans. As one example, GEMS contain the SMN (survival of motor neurons) protein. Certain mutations of the gene encoding this protein are the cause of inherited spinal muscular atrophy, a human disease characterized by a wasting away of the muscles. The disease seems to be caused by a subtle defect in snRNP assembly and subsequent pre-mRNA splicing. More severe defects would be expected to be lethal.

Given the importance of nuclear subdomains in RNA processing, it might have been expected that pre-mRNA splicing would occur in a particular location in the nucleus, as it requires numerous RNA and protein components. However, we have seen that the assembly of splicing components on pre-mRNA is co-transcriptional; thus splicing must occur at many locations along chromosomes. We saw in Chapter 4 that interphase chromosomes occupy discrete territories in the nucleus, and transcription and pre-mRNA splicing must take place within these territories. However, interphase chromosomes are themselves dynamic and their exact positioning in the nucleus correlates with gene expression. For example, transcriptionally silent regions of interphase chromosomes are often associated with the nuclear envelope where the concentration of heterochromatin components is believed to be especially high. When these same regions become transcriptionally active, they relocate towards the interior of the nucleus, which is richer in the components required for mRNA synthesis. It has been proposed that, although a typical mammalian cell may be expressing on the order of 15,000 genes, transcription and RNA splicing may be localized to only several thousand sites in the nucleus. These sites themselves are highly dynamic and probably result from the association of transcription and splicing components to create small “assembly lines” where the local concentration of these components is very high. As a result, the nucleus seems to be highly organized into subdomains, with snRNPs, snoRNPs, and other nuclear components moving between them in an orderly fashion according to the needs of the cell (Figure 6-49).

Before the synthesis of a particular protein can begin, the corresponding mRNA molecule must be produced by transcription. Bacteria contain a single type of RNA polymerase (the enzyme that carries out the transcription of DNA into RNA). An mRNA molecule is produced when this enzyme initiates transcription at a promoter, synthesizes the RNA by chain elongation, stops transcription at a terminator, and releases both the DNA template and the completed mRNA molecule. In eucaryotic cells, the process of transcription is much more complex, and there are three RNA polymerases—designated polymerase I, II, and III—that are related evolutionarily to one another and to the bacterial polymerase.

Eucaryotic mRNA is synthesized by RNA polymerase II. This enzyme requires a series of additional proteins, termed the general transcription factors, to initiate transcription on a purified DNA template and still more proteins (including chromatin-remodeling complexes and histone acetyltransferases) to initiate transcription on its chromatin template inside the cell. During the elongation phase of transcription, the nascent RNA undergoes three types of processing events: a special nucleotide is added to its 5′ end (capping), intron sequences are removed from the middle of the RNA molecule (splicing), and the 3′ end of the RNA is generated (cleavage and polyadenylation). Some of these RNA processing events that modify the initial RNA transcript (for example, those involved in RNA splicing) are carried out primarily by special small RNA molecules.

For some genes, RNA is the final product. In eucaryotes, these genes are usually transcribed by either RNA polymerase I or RNA polymerase III. RNA polymerase I makes the ribosomal RNAs. After their synthesis as a large precursor, the rRNAs are chemically modified, cleaved, and assembled into ribosomes in the nucleolus—a distinct subnuclear structure that also helps to process some smaller RNA-protein complexes in the cell. Additional subnuclear structures (including Cajal bodies and interchromatin granule clusters) are sites where components involved in RNA processing are assembled, stored, and recycled.

Image ch4f4

Image ch5f4

Image ch5f15

Image ch1f31

Image ch4f7

Image ch3f11

Image ch12f16

Image ch4f11