ADVERTISEMENTS:
This article throws light upon the seven properties of the genetic code. The seven properties are:
(1) A Non-overlapping Code (2) Exceptions to the Code (3) Transfer of Information via the Genetic Code (4) Reading Frame of a Sequence (5) Start/Stop Codons (6) Degeneracy of the Genetic Code and (7) Variations to the Standard Genetic Code.
We need several thousands of different proteins in our body. It is the genetic material, the DNA, in our cells that provides the information needed to produce all these proteins. Though the linear sequence of nucleotides in DNA contains the information for protein sequences, proteins are not made directly from DNA. Instead, a messenger RNA (mRNA) molecule is synthesized from the DNA and directs the formation of the protein.
ADVERTISEMENTS:
RNA is composed of four nucleotides: adenine (A), guanine (G), cytosine (C), and uracil (U). Three adjacent nucleotides constitute a unit known as the codon, which codes for an amino acid. The genetic code consists of 64 triplets of nucleotides (Fig. 6.1). These triplets are called codons. With three exceptions, each codon encodes for one of the 20 amino acids used in the synthesis of proteins.
Most of the amino acids are encoded by more than one codon. The deciphering of the genetic code was accomplished by the American biochemists Marshall W. Nirenberg, Robert W. Holley, and Har Gobind Khorana in the early 1960s. The codons for each amino acid have been deciphered by using a variety of synthetic polyribonucleotide’s, which were added to the protein synthesizing system isolated from E. coli.
Radioactive amino acid were added to the system and the protein synthesized was monitored, e.g., poly A i.e., A-A-A-A-A-A-A led to poly lysine being formed, e.g., poly C i.e., C-C-C-C-C-C-C led to poly proline being formed, so AAA codes for lys and CCC codes for proline. Development of this technique has enabled the full genetic code to be deciphered.
Properties of the Genetic Code:
1. The code is read in non-overlapping groups of three mRNA nucleotides. Each group is called a codon.
ADVERTISEMENTS:
2. There are no spaces or commas separating neighboring codons. This is like having a sentence in English consisting entirely of 3 letter words where there are no spaces between the words. This property is especially important in understanding the effects of mutations on proteins.
3. The genetic code is redundant. There are 64 possible codons but only 20 amino acids.
4. There is a start codon corresponding to the amino acid methionine. When translation begins the first amino acid is always methionine. After translation this amino acid is removed as part of editing the protein.
Note: methionine can be incorporated during peptide chain elongation and can occur in the protein (Fig. 6.2).
5. There are three non-coding stop or nonsense codons. These tell the machinery of translation that the end of the protein has been reached.
6. Not all amino acids have an equal number of codons coding for it. Observe that tryptophan has one codon while arginine has six codons.
7. The code is almost universal. However, certain bacteria, mitochondria and protista have minor variations in their codes. The near universality of the code suggests that the code arose very early in the evolution of life.
Property # 1. A Non-overlapping Code:
The genetic code is read in groups (or “words”) of three nucleotides. After reading one triplet, the “reading frame” shifts over three letters, not just one or two. In the following example, the code would not be read CAT, ATG. Rather, the code would be read ACA, TGA (Fig. 6.3).
ADVERTISEMENTS:
Property # 2. Exceptions to the Code:
The genetic code is almost universal. The same codons are assigned to the same amino acids and to the same START and STOP signals in the vast majority of genes in animals, plants, and microorganisms. However, some exceptions have been found. Most of these involve assigning one or two of the three STOP codons to an amino acid instead.
Property # 3. Transfer of Information via the Genetic Code:
ADVERTISEMENTS:
The genome of an organism is inscribed in DNA, or in some viruses RNA. The portion of the genome that codes for a protein or an RNA is referred to as a gene. Those genes that code for proteins are composed of tri-nucleotide units called codons, each coding for a single amino acid. Each protein-coding gene is transcribed into a template molecule of the related polymer RNA, known as messenger RNA or mRNA. This in turn is translated on the ribosome into an amino acid chain or polypeptide.
The process of translation requires transfer RNAs specific for individual amino acids with the amino acids covalently attached to them, guanosine triphosphate as an energy source, and a number of translation factors. tRNAs have anticodons complementary to the codons in mRNA and can be “charged” covalently with amino acids at their 3′ terminal CCA ends.
Individual tRNAs are charged with specific amino acids by enzymes known as aminoacyl tRNA synthetases which have high specificity for both their cognate amino acids and tRNAs. The high specificity of these enzymes is a major reason why the fidelity of protein translation is maintained. The standard genetic code is shown in the figure 6.1.
Property # 4. Reading Frame of a Sequence:
Note that a codon is defined by the initial nucleotide from which translation starts. For example, the string GGGAAACCC, if read from the first position, contains the codons GGG AAA and CCC; and if read from the second position, it contains the codons GGA and AAC; if read starting from the third position, GAA and ACC.
ADVERTISEMENTS:
Partial codons have been ignored in this example. Every sequence can thus be read in three reading frames, each of which will produce a different amino acid sequence (in the given example, Gly-Lys-Pro, Gly-Asp, or Glu-Thr, respectively).
With double-stranded DNA there are six possible reading frames, three in the forward orientation on one strand and three reverse (on the opposite strand). The actual frame a protein sequence is translated is determined by a start codon, usually the first AUG codon in the mRNA sequence.
Property # 5. Start/Stop Codons:
Translation starts with a chain initiation codon (start codon). Unlike stop codons, the codon alone is not sufficient to begin the process. Nearby sequences and initiation factors are also required to start translation. There is only one start codon: AUG, which codes for methionine, so every amino acid chain must start with methionine.
The three stop codons have been given names: UAG is amber, UGA is opal (sometimes also called umber), and UAA is ochre. “Amber” was named by discoverers Richard Epstein and Charles Steinberg after their friend Harris Bernstein, whose last name means “amber” in German. The other two stop codons were named “ochre” and “opal” in order to keep the “color names” theme.
ADVERTISEMENTS:
Stop codons are also called termination codons and they give signal to release the nascent polypeptide from the ribosome. This is due to binding of release factors in the absence of cognate tRNAs with anticodons complementary to these stop signals.
Property # 6. Degeneracy of the Genetic Code:
The genetic code has redundancy but no ambiguity. For example, although codons GAA and GAG both specify glutamic acid (redundancy), neither of them specifies any other amino acid (no ambiguity). Degenerate codons may differ in their third positions; e.g., both GAA and GAG code for the amino acid glutamic acid.
A codon is said to be fourfold degenerate if any nucleotide at its third position specifies the same amino acid; it is said to be twofold degenerate if only two of four possible nucleotides at its third position specify the same amino acid.
In twofold degenerate codons, the equivalent third position nucleotides are always either two purines (A/G) or two pyrimidine’s (C/T). Only two amino acids are specified by a single codon; one of these is the amino-acid methionine, specified by the codon AUG which also specifies the start of translation; the other is tryptophan, specified by the codon UGG.
Degeneracy results because a triplet code designates 20 amino acids and a stop codon. Because there are four bases, triplet codons are required to produce at least 21 different codes. For example, if there were two bases per codon, then only 16 amino acids could be coded for (4=16). Because at least 21 codes are required, then 4 gives 64 possible codons, meaning that some degeneracy must exist. These properties of the genetic code make it more fault-tolerant for point mutations.
Wobble Hypothesis:
ADVERTISEMENTS:
There are 64 different triplet codons, and only 20 amino acids. Unless some amino acids are specified by more than one codon, some codons would be completely meaningless. Therefore, some redundancy is built into the system: some amino acids are coded for by multiple codons.
In some cases, the redundant codons are related to each other by sequence; for example, leucine is specified by the codons CUU, CUA, CUC, and CUG. Note how the codons are the same except for the third nucleotide position. This third position is known as the “wobble” position of the codon (Fig. 6.4). This is because in a number of cases, the identity of the base at the third position can wobble, and the same amino acid will still be specified (Table 6.1). This property allows some protection against mutation if a mutation occurs at the third position of a codon, there is a good chance that the amino acid specified in the encoded protein won’t change.
In 1966, Francis Crick proposed the wobble concept to explain this phenomenon the wobble rules do not permit any single tRNA molecule to recognize four different codons. These codons can be recognized only when inosine occupies the first (5′) position of the anticodon.
Property # 7. Variations to the Standard Genetic Code:
Slight variations in the standard code were observed by researchers while studying human mitochondrial genes. They discovered that mitochondrial genes use some alternative codes. Such small variants were also seen in organisms such as Mycoplasma translating the codon UGA as tryptophan. In bacteria and Archaea, GUG and UUG are common start codons. However, in rare cases, certain specific proteins may use alternative initiation (start) codons not normally used by that species.
ADVERTISEMENTS:
In certain proteins, non-standard amino acids are substituted for standard stop codons, depending upon associated signal sequences in the messenger RNA: UGA can code for selenocysteine and UAG can code for pyrrolysine. Selenocysteine is now viewed as the 21st amino acid, and pyrrolysine is viewed as the 22nd.