ADVERTISEMENTS:
Read this article to learn about the basic concepts, technology and applications of proteomics.
Basic Concepts of Proteomics:
The gene transcripts that an individual can make in a lifetime—termed as transcriptome (by analogy with the term genome)—refers to the haploid set of chromosomes carrying all the functional genes.
Similarly, all the proteins made by an organism are now grouped under the shade of proteomics. Proteomics involves the systematic study of proteins in order to provide a comprehensive view of the structure, function and role in the regulation of a biological system.
ADVERTISEMENTS:
These include protein-protein interaction, protein modification, protein function and its localization studies. The aim of proteomics is not only to identify all the proteins in a cell but also to create a complete three-dimensional map of the cell indicating where proteins are located. Coupled with advances in bioinformatics, this approach to comprehensively describing biological systems will undoubtedly have a major impact on our understanding of the phenotype of both normal and diseased cells.
The proteome (term coined by Mark Wilkins in 1995) of a given cell is the total number of proteins at any given instant and it is highly dynamic in response to internal and external cues. Proteins can be modified by post-translational modifications, undergo translocations within the cell or be synthesized or degraded.
Therefore, the examination of proteins of a cell at a particular time reflects the immediate protein environment in which it is studied. A cellular proteome is the collection of proteins found in a particular cell type under the influence of a particular set of environmental conditions like exposure to hormone stimulation. A complete set of proteins from all of the various cellular proteomes will form an organism’s complete proteome.
An interesting finding of the Human Genome Project is that there are far more proteins in the human proteome (~ 400,000 proteins) than there are protein-coding genes in the human genome (~ 22,000 genes). The large increase in protein diversity is thought to be due to alternative splicing and post-translational modification of proteins. This indicates that protein diversity cannot be fully characterized by gene expression analysis alone. Proteomics, thus is a useful tool for characterizing cells and tissues of interest.
ADVERTISEMENTS:
The first protein studies that can be called proteomics began with the introduction of two dimensional gel electrophoresis of E. coli proteins (O’Ferrall, 1975) followed by mouse and guinea pig protein studies (Ksole, 1975). Although 2-dimensional electrophoresis (2-DE) was a major step forward and many proteins could be separated and visualized by this technique but it was not enough for the protein identification through any sensitive protein sequencing technology.
After certain efforts the first major technology for the identification of protein was protein sequencing by Edman degradation (Edman, 1949). This technology was used for the identification of proteins from 2-D gels to create first 2D database (Celis et al. 1987). Another most important development in protein identification was Mass Spectrometry (MS) technology (Andersen et al. 2000). Protein sequencing by MS technology has been increased due to its sensitivity of analysis, tolerate protein complexes and amenable to high throughput operations.
Although several advancements have been made in protein identification (by MS or Edman sequencing) without having the database of large scale DNA sequencing of expressed sequences and genomic DNA, proteins could not be characterized because different protein isoforms can be generated from a single gene through several modifications (Fig. 18.1). And the majority of DNA and protein sequences have been accumulated within a short period of time.
In 1995, the sequencing of the genome of an organism was done for the first time in Haemophilus influenzae (Fleischmann et al. 1995). Till date, sequencing of several other eukaryotic genomes have been completed viz. Arabidopsis thaliana (Tabata, 2000), Sachcharomyces cerevisiae (Goffeau, 1996), Caenorhabditis elegans (Abbott, 1998), Oryza (Matsumoto, 2001) and human (Venter, 2001).
For protein expression profiling, a common procedure is the analysis of mRNA by different methods including serial analysis of gene expression (SAGE) (Velculescu et al. 1995) and DNA microarray technology (Shalon, 1996). However, the level of transcription of a gene gives only a rough idea of the real level of expression of that gene.
An mRNA may be produced in abundance, but at the same time degraded rapidly, or translated inefficiently keeping the amount of protein minimum. Proteins having been formed are subjected to post-translational modifications also. Different post-translational modifications or proteolysis and compartmentalization regulate the protein functions in the cell (Fig. 18.1).
The average number of proteins formed per gene was predicted to be one or two in bacterium, three in yeast and three or more in humans (Wilkins et al. 1996). In response to extra-cellular responses, a number of proteins undergo post-translational modifications. Protein phosphorylation is an important signaling mechanism and dis-regulation of protein kinase and phosphatase can result oncogenesis (Hunter, 1995).
Through proteome analysis, changes in the modifications of many proteins expressed by a cell can be analyzed after translation. Another important feature of a protein is its localization in the cell. The mis-localization of proteins is known to have an adverse effect on cellular function (cystic fibrosis) (Drumm and Collins, 1993). The cell growth, programmed cell death and the decision to proceed through the cell cycle are all regulated by signal transduction through protein complexes (Pippin et al. 1993). The protein interaction can be detected by using yeast two-hybrid system (Rain et al. 2001).
ADVERTISEMENTS:
To Understand a Proteome, Three Distinct Type of Analysis must be Carried Out:
(1) Protein-expression proteomics is the quantitative study of the protein expression of the entire proteome or sub-proteome of two samples that differ by some variable. Identification of novel proteins in signal transduction and disease specific proteins are major outcome of this approach.
(2) Structural proteomics attempts to identify all the proteins within a complex or organelle, determine their localization, and characterize all protein-protein interactions. The major goal of these studies is to map out the structure of protein complexes or cellular organelle proteins (Blackstock and Weir, 1999).
(3) Functional proteomics allows the study of a selected group of proteins responsible in signaling pathways, diseases and protein-protein interactions. This may be possible by isolating the specific sub-proteomes by affinity-chromatography for further analysis (Fig. 18.2):
Technology of Proteomics:
Measurement of the level of a gene transcript does not necessarily give clear picture of protein products formed. Therefore, for the measurement of real gene expression, the proteins should be analyzed. Before the identification and measurement of the activity, all the proteins in a proteome for any instant should be separated from each other.
A Typical Proteomics Experiment (e. g. Protein Expression Profiling) can be Divided into the following Categories:
(i) Separation and isolation of protein
(ii) The acquisition of protein structural information for protein identification and characterization
ADVERTISEMENTS:
(iii) Database utilization.
(i) Protein Separation and Isolation:
An essential component of proteomics is the protein electrophoresis, the most effective way to resolve a complex mixture of proteins. Two types of electrophoresis are available as one and two-dimensional electrophoresis. In one dimensional gel electrophoresis (1-DE), proteins are resolved on the basis of their molecular masses. Proteins are stable enough during 1-DE due to their solubility in sodium dodycyl sulphate (SDS). Proteins with molecular mass of 10-300 kDa can be easily separated through 1-DE.
But with complex protein mixtures, results with 1-DE are limited, so for more complex protein mixture such as crude cell lysate, the best separation tool available is two dimensional gel electrophoresis (2-DE) (O’Ferrall, 1975). Here, proteins are separated according to their net charges in first dimension and according to their molecular masses in second dimension.
As a single 2-DE gel can resolve thousands of proteins, it remains a powerful tool for the cataloging of proteins. Two-dimensional electrophoresis has the ability to resolve proteins that have gone under some post-translational modifications as well as protein expression of any two samples can be compared quantitatively and qualitatively. Recently pH gradients have been introduced to 2-DE which greatly improved the reproducibility of this technique (Bjellqvist et al. 1993).
ADVERTISEMENTS:
However, few problems with 2-DE still remain to be solved. Despite efforts to automate protein analysis by 2-DE, it is still a labour-intensive and time-consuming process. Another major limitation of 2-DE is the inability to detect low copy number proteins when a total cell lysate is analyzed (Link et al. 1997; Shevchenko et al. 1996) as well as inefficiency to speed up the in-gel digestion process also.
Therefore, alternatives have been searched to bypass protein gel electrophoresis. One approach is proteolytic digestion of protein mixture to convert them into peptides and then purify the peptides before subjecting them to analysis by mass spectrometry (MS). Peptide purification has been simplified through liquid chromatography (Link et al. 1999; McCormack et al. 1997), capillary electrophoresis (Figeys et al. 1999; Tong et al. 1999) and reverse phase chromatography (Opiteck et al. 1997).
Recently, Juan et al. (2005) have developed a new approach to speed up the protein identification process utilizing ‘microwave’ technology. Proteins excised from the gels are subjected to trypsin digestion by microwave irradiation, which rapidly produces peptides fragments. These fragments could be analyzed by MALDI (Matrix Assisted Laser Desorption/Ionization). Despite much downstream research on certain alternatives to 2-DE, this is the most widely utilized technique for proteome studies.
(ii) Acquisition of Protein Structures: Protein Identification:
Edman Sequencing (ES):
One of the earliest methods used for protein identification was micro sequencing by Edman chemistry to obtain N-terminal amino acid sequences. This technique was introduced by Edman in 1949. In Edman sequencing, N-terminal of a protein is sequenced to determine its true start site. Edman sequencing is more applicable sequencing method for the identification of proteins separated by SDS-Polyacrylamide gel electrophoresis.
This method has been used extensively in the starting years of proteomics but certain limitations have emerged in recent time. One of the major limitations is the N-terminal modification of proteins. If any protein is blocked on N-terminal before sequencing, then it is very difficult to identify the protein.
ADVERTISEMENTS:
To overcome this problem a novel approach of mixed peptide sequencing (Damer et al. 1998) has been employed recently. In this approach, a protein is converted into peptides by cleavage with cyanogen bromide (CNBr) or skatole followed by the Edman sequencing of peptides.
Mass Spectrometry (MS):
The most significant breakthrough in proteomics has been the mass spectrometric identification of gel-separated proteins. Due to its high sensitivity levels, identification of proteins in protein complexes/mixtures and high throughput, this technique has been proved far better than ES.
In mass spectrometry, proteins are digested into peptides in the gel itself by suitable protease such as trypsin, because proteins, as such, are difficult to elute out from the gels. Moreover, molecular weight of proteins is not usually suitable for database identification. In contrast, peptides can be eluted from the gels easily and matching of even a small set of peptides to the database is quite sufficient to identify a protein.
There are Two Main Approaches to Mass Spectrometric Protein Identification:
(i) “Electrospray ionization” (ESI) involves the fragmentation of individual peptides followed by direct ionization through electrospray in a tandem mass spectrometer. In ESI, a liquid sample flows from a microcapillary tube into the orifice of the mass spectrometer, where a potential difference between the capillary and the inlet to the mass spectrometer results in the generation of a fine mist of charged droplets (Fenn et al. 1989; Hunt et al. 1981).
ADVERTISEMENTS:
It has the ability to resolve peptides in a mixture, isolate one species at a time and dissociate it into amino or carboxy-terminal containing fragments designated ‘b’ and ‘y’, respectively.
(ii) In “Peptide mass mapping” approach (Henzel et al. 1993) the mass spectrum of the eluted peptide mixture is acquired, which result in a peptide mass fingerprint of the protein being studied. The mass spectrum is obtained by a relatively simple ‘mass spectrometric method-matrix assisted laser desorption/ ionization’ (MALDI).
In this approach, tryptic peptide mixture is analyzed because trypsin cleaves proteins at the amino acid arginine and lysine. As the tryptic peptides can be predicted theoretically for any protein, the predicted peptide masses can be compared with those obtained experimentally by MALDI analysis. If the sufficient number of peptide matches with the existing protein sequence in database, the accuracy for protein identification is high.
After the protease cleavages of the proteins, they are analyzed by mass analysis also. Mass analysis follows the conversion of proteins or peptides into molecular ions. These ions got separated in a mass spectrometer based on their mass/charge (m/z) ratio. It is determined by the time it takes for the ions to reach the detector. Hence the instrument is called a time of flight (TOF) instrument.
The relationship that allows the m/z ratio to be determined is E = 1/2 (m/z)v2. In this equation. E is the energy imparted on the charged ions as a result of the voltage that is applied by the instrument and V is the velocity of the ions down the flight path. As peptide ions are introduced into the collision chamber, they interact with collision gas and undergo fragmentation along the peptide backbone (Fig. 18.4).
Because all the ions are exposed to the same electric field, all similarly charged ions will have similar energies. Therefore, based on the above equation, ions that have larger mass must have lower velocities and hence will require longer times to reach the detector. Different steps involved in mass spectrometry are described in a flow chart in Fig. 18.3.
(iii) Database Utilization:
Initially, sequencing of some proteins or peptides followed by the submission of sequences together created an assembly of proteins called protein database. Proteolytic digestion of many proteins are also predicted theoretically and deposited in database. Hence, at present, so much information has been accumulated that we can search for a homology between a new peptide sequence and the existing sequences in the database to identify the protein.
The major goal of database searching is to identify a large number of proteins—quickly and accurately. All the information accumulated through Edman sequencing or mass spectrometry are used to identify the proteins. In peptide mass fingerprinting database searching, the mass of a unknown peptide after proteolytic digestion is compared to the perdicted mass of peptide from theoretical digestion of proteins in database. In amino acid sequence database searching, the sequence of amino acids from a peptide is identified and can be used to search databases to find the protein from which it was derived.
Collection of protein sequence databases are thus designed to represent a partial list of an organism’s genome, that is, the genes and all of the proteins they encode. The protein families are usually classified according to their evolutionary history inferred from sequence homology.
These databases are excellent tools for gene discovery, comparative genomics and molecular evolution. The purpose of database similarity searching is the sensitive detection of sequence homologues, regardless of the species relationship in order to infer similarity of function from similarity of sequence.
Recently, Chromatography-based proteomics is used to measure the concentration of low molecular weight peptides in complex mixtures such as plasma or sera. These technologies use time-of-flight (TOF) spectroscopy with matrix-assisted or surface- enhanced laser desorption/ionization to produce a spectrum of mass-to-charge (m/z) ratios that can be analysed in order to identify unique signatures from its chromatography pattern.
Applications of Proteomics:
1. Post-Translational Modifications:
Proteomics studies involve certain unique features as the ability to analyze post- translational modifications of proteins. These modifications can be phosphorylation, glycosylation and sulphation as well as some other modifications involved in the maintenance of the structure of a protein.
These modifications are very important for the activity, solubility and localization of proteins in the cell. Determination of protein modification is much more difficult rather than the identification of proteins. As for identification purpose, only few peptides are required for protease cleavages followed by database alignment of a known sequence of a peptide. But for determination of modification in a protein, much more material is needed as all the peptides do not have the expected molecular mass need to be analyzed further.
For example, during protein phosphorylation events, phosphopeptides are 80 Da heavier than their unmodified counterparts. Therefore, it gives, rise to a specific fragment (PO3- mass 79) bind to metal resins, get recognized by specific antibodies and later phosphate group can be removed by phosphatases (Clauser et al. 1999; Colledge and Scott, 1999). So protein of interest (post-translationally modified protein) can be detected by Western blotting with the help of antibodies or 32P-labelling that recognize only the active state of molecules. Later, these spots can be identified by mass spectrometry.
2. Protein-Protein Interactions:
ADVERTISEMENTS:
The major attribution of proteomics towards the development of protein interactions map of a cell is of immense value to understand the biology of a cell. The knowledge about the time of expression of a particular protein, its level of expression, and, finally, its interaction with another protein to form an intermediate for the performance of a specific biological function is currently available.
These intermediates can be exploited for therapeutic purposes also. An attractive way to study the protein-protein interactions is to purify the entire multi-protein complex by affinity based methods using GST-fusion proteins, antibodies, peptides etc.
The yeast two-hybrid system has emerged as a powerful tool to study protein-protein interactions (Haynes and Yates, 2000). According to Pandey and Mann (2000) it is a genetic method based on the modular structure of transcription factors in the close proximity of DNA binding domain to the activation domain induces increased transcription of a set of genes.
The yeast hybrid system uses ORFs fused to the DNA binding or activation domain of GAL4 such that increased transcription of a reporter gene results when the proteins encoded by two ORFs interact in the nucleus of the yeast cell. One of the main consequences of this is that once a positive interaction is detected, simply sequencing the relevant clones identifies the ORF. For this reason it is a generic method that is simple and amenable to high throughput screening of protein-protein interactions.
Phage display is a method where bacteriophage particles are made to express either a peptide or protein of interest fused to a capsid or coat protein. It can be used to screen for peptide epitopes, peptide ligands, enzyme substrate or single chain antibody fragments.
Another important method to detect protein-protein interactions involves the use of fluorescence resonance energy transfer (FRET) between fluorescent tags on interacting proteins. FRET is a non-radioactive process whereby energy from an excited donor fluorophore is transferred to an acceptor fluorophore. After excitation of the first fluorophore, FRET is detected either by emission from the second fluorophore using appropriate filters or by alteration of the fluorescence lifetime of the donor.
A proteomics strategy of increasing importance involves the localization of proteins in cells as a necessary first step towards understanding protein function in complex cellular networks. The discovery of GFP (green fluorescent protein) and the development of its spectral variants has opened the door to analysis of proteins in living cells by use of the light microscope.
Large-scale approaches of localizing GFP-tagged proteins in cells have been performed in the genetically amenable yeast S. pombe (Ding et al. 2000) and in Drosophila (Morin et al. 2001). To localize proteins in mammalian cells, a strategy was developed that enables the systematic GFP tagging of ORFs from novel full-length cDNAs that are identified in genome projects.
3. Protein Expression Profiling:
The largest application of proteomics continues to be protein expression profiling. The expression levels of a protein sample could be measured by 2-DE or other novel technique such as isotope coded affinity tag (ICAT). Using these approaches the varying levels of expression of two different protein samples can also be analyzed.
This application of proteomics would be helpful in identifying the signaling mechanisms as well as disease specific proteins. With the help of 2-DE several proteins have been identified that are responsible for heart diseases and cancer (Celis et al. 1999). Proteomics helps in identifying the cancer cells from the non-cancerous cells due to the presence of differentially expressed proteins.
The technique of Isotope Coded Affinity Tag has developed new horizons in the field of proteomics. This involves the labeling of two different proteins from two different sources with two chemically identical reagents that differ in their masses due to isotope composition (Gygi et al. 1999). The biggest advantage of this technique is the elimination of protein quantitation by 2-DE. Therefore, high amount of protein sample can be used to enrich low abundance proteins.
Different methods have been used to probe genomic sets of proteins for biochemical activity. One method is called a biochemical genomics approach, which uses parallel biochemical analysis of a proteome comprised of pools of purified proteins in order to identify proteins and the corresponding ORFs responsible for a biochemical activity.
The second approach for analyzing genomic sets of proteins is the use of functional protein microarrays, in which individually purified proteins are separately spotted on a surface such as a glass slide and then analyzed for activity. This approach has huge potential for rapid high-throughput analysis of proteomes and other large collections of proteins, and promises to transform the field of biochemical analysis.
4. Molecular Medicine:
With the help of the information available through clinical proteomics, several drugs have been designed. This aims to discover the proteins with medical relevance to identify a potential target for pharmaceutical development, a marker(s) for disease diagnosis or staging, and risk assessment—both for medical and environmental studies. Proteomic technologies will play an important role in drug discovery, diagnostics and molecular medicine because of the link between genes, proteins and disease.
As researchers study defective proteins that cause particular diseases, their findings will help develop new drugs that either alter the shape of a defective protein or mimic a missing one. Already, many of the best-selling drugs today either act by targeting proteins or are proteins themselves. Advances in proteomics may help scientists eventually create medications that are “personalized” for different individuals to be more effective and have fewer side effects. Current research is looking at protein families linked to disease including cancer, diabetes and heart disease.