ADVERTISEMENTS:
Read this article to learn about the meaning, distribution, origin, application, functional importance, role as genetic makers, Indian initiative of Single Nucleotide Polymorphism (SNP).
Meaning of Single Nucleotide Polymorphism (SNPs):
A Single Nucleotide Polymorphism or SNP (pronounced ‘snip’) is a small genetic change, or variation, that can occur within a DNA sequence.
The four nucleotide letters A (adenine), C (cytosine), T (thymine), and G (guanine) specify the genetic code.
ADVERTISEMENTS:
SNP variation occurs when a single nucleotide, such as an A, replaces one of the other three nucleotide letters – C, G, or T (Fig. 20-1).
By classical definition of polymorphism the frequency of the variation will have to be at least 1% to qualify the nucleotide change as a polymorphism. Those nucleotide changes that occurs less than 1% would be called rare variant.
Because only about 1.1 to 1.4% of a person’s DNA sequences codes for proteins, most SNPs are found outside of coding sequences. SNPs lying outside the coding region normally would not be expected to have any impact on the phenotype of an organism. SNPs found within a coding sequence are of particular interest to researchers as they are more likely to alter the biological function of a protein, although these changes have much less drastic effect than that of mutations.
Due to recent advances in field of gene identification and characterization, there has been a huge flurry of SNP discovery. Finding single nucleotide changes throughout the human genome seems a mammoth job, but, over the last 20 years, researchers have developed a number of techniques that makes it possible.
ADVERTISEMENTS:
Each technique uses a similar non-identical method to compare selected regions of a DNA sequence obtained from multiple individuals who share a common trait. In each test, the result shows a difference in the DNA samples when a SNP is detected in one individual in a pool under test.
Distribution of SNPs:
SNPs are not distributed uniformly over the genome. A huge number of SNPs are distributed throughout the non-coding region of the genome. Since these regions are free from selection pressure, these changes are selected neutrally and fixed over time. The distribution patterns of the SNPs are variable even in a single chromosome.
For instance, regions responsible for antigen presentation to the immune system, present on the chromosome 6, shows very high nucleotide variability in contrast to other regions of the same chromosome.
The Origin, Survival and Fixation of SNPs:
The SNP is the main source of variance in the genome and it accounts for 90% of all human polymorphism.
There are Two Types of Nucleotide Base Substitution:
Transition:
Transformation, which accounts for nearly two-thirds of all SNPs, occurs between purines (e.g. A > G) or pyrimidines (e.g. C > T).
Transversion:
Transversion occurs between purines and pyrimidines (e.g. A > C and G > T).
ADVERTISEMENTS:
A SNP undergoes series of selection procedures before finally being established.
Its Life can be Roughly Divided into 4 Phases:
1) Appearing by the means of point mutations.
2) Surviving the selection pressure of the nature.
ADVERTISEMENTS:
3) Spreading through generations.
4) Establishing itself at least as 1% of all alleles.
The most frequent change in humans is the mutation from CpG to TpG (a transition accounting for approximately 25% of all mutations). This mechanism causes decrease in the number of CG dinucleotide in the genome since many eventually becomes TG, whereas new CpG sites will be created by other less frequent mutations.
Since only 1.1% to 1.4% of the genome codes for proteins SNPs are likely to occur at non-coding sequences more frequently. Even if the SNP occurs at a coding sequence, mostly it might have a subtle and non-deleterious effect on the expressed proteins. Changes accounting for deleterious effects are eventually removed from the genome by natural selection. Hence to attain the status of an SNP, a point mutation should be non-deleterious to be selected (Miller and Kwok, 2001).
Genetic Predisposition:
ADVERTISEMENTS:
Most common diseases in humans are not caused by a genetic variation within a single gene, but are influenced by complex interactions among multiple genes as well as environmental and lifestyle factors. Although both environmental and lifestyle factors add up in the phenotype of a disease, it is difficult to measure and evaluate their overall effect on a disease process.
The probability of an individual to develop a disease based on genes and hereditary factors is referred to as genetic predisposition. Genetic factors confer susceptibility or resistance to a disease and determine the severity or progression of disease.
Most of the predisposition factors are still unknown. Researchers have found it difficult to develop screening tests for most diseases and disorders. Phenotypic association of certain coding SNPs with a disorder of that specific gene has led to identification of functional aspect of the SNPs.
Single Nucleotide Polymorphism also can be used as a tool for identifying genes, responsible for the disease or, genes imparting a certain phenotype of the disease. By studying stretches of DNA sequence that have been found to harbor a SNP associated with a disease trait, researchers may begin to reveal relevant genes associated with a disease.
ADVERTISEMENTS:
Understanding the role of genetic factors in disease will also allow researchers to better evaluate the role of non-genetic factors—such as habitat, upbringing, behavior, diet, lifestyle, and physical activity, on the disease.
As genetic factors also affect an individual’s response to a drug therapy, SNPs will be useful in helping researchers determine and understand why individuals differ in their abilities to metabolize certain drugs, as well as to determine why an individual may experience an adverse side effect to a particular drug.
Therefore, the recent discovery of SNPs promises to revolutionize not only the process of disease detection, but also the practice of personalized, preventative and curative medicine.
Application of SNP in Pharmacogenomics Studies:
Response rates towards major and common drugs vary overtly among individuals (Table 1). SNPs attribute in a major way towards this phenomenon. Using SNPs to study the genetics of drug response has the potential to help in the creation of personalized medicine as explained in (Fig. 20.2). As mentioned earlier, SNPs may also be associated with the metabolism i.e., absorbance and clearance of therapeutic agents.
Currently, there is no standard genetic screening of drug metabolizing genes to determine how a patient will respond to a particular medication. A treatment proven effective in one patient may be ineffective in others. Some patients may also experience adverse immunological reaction to a particular drug.
Hence pharmaceutical companies limit their production of drugs for which an ‘average’ patient will respond. As a result a relatively smaller group of patients harboring any putative genetic variation (e.g. a SNP), which renders them unable to metabolize that drug, remains untreated. Many drugs that might benefit that small group of patients never make it to market as those drugs would fetch less profit for the drug Industries.
*SSRI= Selective Serotonin Reuptake Inhabitors
The data presented in the above Table (taken from the Physicians Desk Reference, 54th edn., Medical Economics Company, 2000) shows the limitation of the efficacy of prescribed drugs to ameliorate the disease among the affected individuals and underscores the importance of exploring for personalized medicine based on the genetic make up of the individuals.
The post-genomic era has revealed association of SNP with certain human diseases either directly or indirectly. For example, genetic studies have shown intricate relationship between:
(a) SNPs in coagulation factor gene F5 and deep-vein thrombosis,
(b) Genetic alteration in the chemokine receptor gene CCR5 and susceptibility to HIV infection and relation between a host of other SNPs and diseases (McCarthy and Hilfiker, 2000).
ADVERTISEMENTS:
These associations exemplify the candidate gene approach and so can be beneficial to identify the condition of predisposition to the disease in an individual (Table 2). Similar, associations are also seen among SNP and drug response variations in individuals (Table 3). For example, genetic variants in a drug-metabolizing enzyme (thiopurine methyltransferase ; TPMT) have been linked to adverse drug reactions (Snow & Gibson, 1995); similarly variation at ALOX5 promoter modulates the response to anti-asthma treatment (Drazen et al. 1999). SNPs in Apolipoprotein E (APOE) gene have been associated with response towards cholinesterase inhibitor in Alzheimer’s patients.
Direct effects of SNP are also seen in various common diseases. Recently SNPs responsible for increased risk of diabetes have been identified. A common genetic variant due to a SNP is peroxisome-proliferator-activated-receptor (PPAR) gamma gene, present in around 25% of type 2 diabetes patients in the population, is thought to make individuals more prone to diabetes (Altshuler et al. 2000).
Although these variations have a modest effect on individual risk but still affect a major portion of the human population. In contrast to identification of direct involvement of SNPs in disease, identification and use of SNPs in gene responsible for drug metabolism and detoxification is limited.
As drug response legends upon the administration of the drug, ‘responders and non-responders’ can only identified after they receive the drug. This makes identification of individuals from a population difficult (McCarthy and Hilfiker, 2000). A correlative study of effect of a drug on a large population already known, followed by a SNP screening of suspect genes, followed by statistical interpretation, may turn out to be useful in understanding SNP-phenotype relationship.
In future, the most appropriate drug for an individual could be determined and used for treatment by analyzing a patient’s SNP profile. The ability to target a drug to those individuals most likely to benefit, referred to as ‘personalized medicine’ would allow pharmaceutical companies to bring many more drugs to market and allow doctors to prescribe individualized therapies specific to a patient’s needs.
The information can be integrated with other resources such as structure of proteins. Using a computational approach, coding SNPs can be plotted on the protein 3D structure and the observed changes can be correlated with the phenotype.
The application of structural data to research on genetic variation is of immense use for studies on the genetic basis of phenotypic variation. Pharmaceutical industry can profit from drug metabolizing gene ‘SNP and mutation database’ containing information regarding the structure of the polymorphism, percentage in the population bearing the variant genotypes, and its phenotypic effect in response to drug treatment.
Structural and Functional Importance of SNPs:
In addition to the SNPs occurring in the coding sequence of genes, functional importance of SNPs has also been observed in non-coding DNA (e.g. introns) including regulatory (e.g. promoters, enhancers etc). One good example of functional SNP is in a non-coding region is the tau gene. The structure of tau exon 10 splicing regulatory element RNA has been recently deciphered and has been shown to form a stable folded stem-loop structure.
Other examples are:
Intronic SNP affecting Splice sites:
Coding region and intronic mutations in the tau gene cause frontotemporal dementia and Parkinsonism linked to chromosome 17. Intronic mutations and some missense mutations increase splicing in-of exon 10, leading to an increased ratio of four-repeat to three-repeat tau isoforms (Varani et al. 1999).
Promoter SNPs Affecting Gene Expression:
SNPs can affect gene expression if they happen to lie on the promoter or any other control sequence of the gene. Transcription factors and the RNA polymerase bind differentially to the promoters based on the sequence context, influencing the expression pattern of the gene. Recent evidences show that SNPs in the promoter region of TNF-α, IL-1β and few other cytokines enhances its expression, hence rendering the individuals more immune to some bacterial and other pathogenic infections (El-Omar et al. 2000).
SNPs in the Coding Regions Affecting the Protein Structure:
Subtle alteration in the DNA could result in drastic alteration in the protein structure as discussed. Structural and functional relationship of apolipoprotein (apo) E in lipoprotein metabolism, heart disease, and neurodegenerative diseases, including Alzheimer’s disease has been established. ApoE is a 299 amino acid long protein with two functional domains.
The amino-terminal domain containing the residues 1-191 contains the low density lipoprotein (LDL) receptor- binding-region, and the carboxyl-terminal domain contains the major lipid-binding elements.
The three common human isoforms—apoE2, apoE3, and apoE4—differ only at two positions in the protein but have very different metabolic properties and dramatic impacts on disease (Fig. 20.3). ApoE3 (Cys-112, Arg-158) binds normally to the LDL receptor and is associated with normal lipid metabolism, whereas apoE2 (Cys-112, Cys- 158) binds defectively to the LDL receptor and is associated with the genetic disorder type III hyperlipoproteinemia. ApoE4 (Arg-112, Arg-158) binds normally to the LDL receptor but is associated with elevated cholesterol levels and, hence, an increased risk for cardiovascular disease (Morrow et al. 2002). In addition, it has been observed that apoE4 is a major risk factor for Alzheimer’s disease.
A number of recent case studies on the effect of SNP on the structure and function of proteins have not only shown the specific structural alteration of the protein in disease, but have also given insights into the regulatory mechanisms of the native protein. Presence of a point mutation (Leu55Pro) in α1-antichymotrypsin, a protease inhibitor of the serpin superfamily, causes its loss of activity (Sunyaev et al. 2001).
The change in the protein due to the point mutation causes obstructive pulmonary disease. Similarly, a point mutation in human apolipoproteinA-1 (ApoA-1) is associated with coronary heart disease (Sunyaev et al. 2001). These studies on the effects of single nucleotide changes on protein structure gives us insights into both the cause of disease and the functions of protein.
Similar studies have been done in mu-opioid receptor, oprm 1. The mu-opioid receptor is the primary site of action for the most commonly used opioids, including morphine, heroin, fentanyl, and methadone. The most prevalent SNP present in about 10% of the population is a nucleotide substitution at position 118 (118 A > G), with predicted amino acid change at a putative N-glycosylation site.
Although the variant protein resulting from the 118 A > G SNP did not show altered binding affinities for most opioid peptides and alkaloids tested, the 118 A > G variant receptor bound beta-endorphin an opioid that activates the opioid receptor, binds approximately 3 times more tightly than the most common allelic form of the receptor.
Furthermore, beta-endorphin is approximately 3 times more potent at the 118 A > G variant receptor than at the common allelic form in agonist-induced activation of G protein-coupled potassium channels. These results suggested that 118 A > G SNP in the opioid receptor gene may have implications for normal physiology and vulnerability to develop diverse diseases including the addictive diseases (Bond et al. 1998).
Relation of SNPs with blood pressure (BP) has also been established. Single nucleotide changes in the Angiotensinogen (AGT) gene are observed, which is common in people with high blood pressure. A North American religious genetic isolate , Hutterites, was tested for association between variation in systolic and diastolic blood pressures and the insertion/deletion polymorphism of Angiotensin-converting enzyme, ACE and 2 protein polymorphisms of AGT (viz., M235T and T174M).
The genotypes of codon 174 were significantly associated with variation in systolic blood pressure in men and accounted for 3.1% of the total variation. Homozygotes for the AGT174M had the highest mean BP, followed by heterozygotes and homozygotes for AGT174T had the lowest mean BP (Hegele et al. 1996).
SNPs as Genetic Markers:
Most SNPs are not responsible for a disease state. Instead, they may serve as biological markers for tracing a disease gene(s) on the human genome map. Since SNPs occur frequently throughout the genome and is relatively stable, they serve as excellent biological markers. Biological markers are DNA segments with a pre-identified physical location in the chromosome, which can be easily tracked and used for constructing a chromosomal map of position of known genes relative to each other.
These maps allow the study identification of traits resulting from the interaction of more than one gene. Hence this strategy plays a major role in cases of complex gene disorders. SNP markers, although biallelic, are preferred over the microsatellite markers as recurrent mutations are generally very rare in case of SNPs.
The National Center for Biotechnology Information (NCBI) plays an important role in facilitating the identification and cataloging of SNPs through the creation and maintenance of the public SNP database (dbSNP), This may be accessed by the biomedical community worldwide and is intended to facilitate many areas of biological research.
SNP in Linkage Disequilibrium Studies:
Particular alleles at neighboring loci tend to be co-inherited. For tightly linked loci, this might lead to association between alleles in the population—a property known as linkage disequilibrium (LD) (Ardlie et al. 2002). The phenomenon of LD can be explained on the basis of co-segregation of two tightly linked alleles in a population, where one form of the haplotype is selected when the population experiences a bottleneck.
Later the selected haplotype becomes the founder haplotype as shown in (Fig. 20.4). Mutation and recombination have the most evident impact on LD. Additional factors contributing to the extent and distribution of LD are genetic drift, population growth, population admixture, migration, natural selection, variable recombination and mutation rates and gene conversion.
Measures of LD:
The Linkage Disequilibrium between Two Points A and B can be Calculated by the Expression:
D = PAB – PA ∞ PB
Where PAB is the frequency of the haplotype that consists of allele A and B
PA and PB are the frequencies of the alleles A and B at loci A and B, respectively.
LD erosion occurs over time and distance. Hence the factor ‘time’ and ‘distance’ should be taken into consideration for calculation of LD.
If D0 is the extent of disequilibrium at a starting point between two alleles, r distant apart, the disequilibrium t generations later (Dt):
ADVERTISEMENTS:
Dt = (1- r)t D0
Complex Diseases and SNP:
Most of the genes responsible for major monogenic disorders have been mapped by positional cloning. These disorders follow the Mendelian pattern of inheritance. Diseases such as diabetes, cancer, asthma, rheumatoid arthritis do not show any clear pattern of such inheritance. Such disorders are referred to as complex or multifactorial diseases.
It is hypothesized that single nucleotide polymorphisms can be used for tracking genes responsible for complex disorders. For that purpose, SNPs which show co-segregation with a certain disease can be used as markers to identify (map) the loci responsible for the disease.
The identified SNPs in candidate genes for a complex disease could be used to determine susceptibility of an individual towards the disease, and when affected his SNP profile for genes related to drug target and drug metabolism could be used to determine the efficacy of the available drugs for therapeutic purposes.
The International HapMap Project: Understanding the Common Human Genetic Variations:
As described earlier, complex interaction of multiple genes, environmental factors and lifestyle result in common diseases, such as diabetes, cancer, stroke, cardiac diseases, psychiatric disorders, asthma etc. Although any two unrelated individuals are same at about 99.9% of their DNA sequences, the remaining 0.1% is important because it contains the genetic variants that provide their unique identity and also influence variability in their risk of disease and response to drugs. Discovering the DNA sequence variants that contribute to common disease risk offers one of the best opportunities for understanding the complex causes of disease in humans.
A centralized as well as multinational effort was required to map, discover and validate the common variations in the human genome. The first step towards grasping this knowledge was realized in 2001, with the completion of the human genome sequence. Consequently the International HapMap Project was initiated
The goal of the Project is to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation. The Project is a powerful tool and a database, intended to facilitate the discovery of genetic contributions to common diseases, and can be used in studies that compare the patterns of genetic variation (haplotypes) in people with a specific disease to patterns in people without the disease.
By identifying regions of the human genome that shows differences in the haplotype patterns, particular genetic variants that contribute to the disease can be easily identified. To produce the HapMap, researchers analyzed blood samples from a total of 269 people from four large populations.
These populations are: Yoruba in Ibadan, Nigeria, Japanese in Tokyo; Han Chinese in Beijing; and Utah residents with ancestry from northen and western Europe. These four populations were selected to include people with ancestry from widely separate geographic regions—Caucasian (European and North American), Yoruban (Negroid) and Chinese and Japanese (Mongoloid). Interestingly, the Indian or the South Asian population is not represented in the HapMap.
In India several initiatives have been taken to study the genetic variations in the Indian population, the major one being ‘The Indian Genome Variation database (IGVdb)’. The HapMap and the IGVdb projects have the potential to unravel the genetic basis of complex diseases leading to discoveries for prevention and treatment of such diseases.
An Indian Initiative:
The Indian subcontinent being a melting pot of different population and culture since the dawn of civilizations, the population can serve as an ideal model to study Single Nucleotide profile and its variety due to its diversity and ancient lineage, The Indian population comprises of more than a billion people, cosisting of 4693 communities with thousands of endogamous groups, 325 functioning languages and 25 scripts.
To understand the origin, evolution, diversity and the migration patterns of the population, SNP serve as an ideal genetic tool. Furthermore, predisposition to complex disorders, variable sensitivity and reaction to different drugs is of prime importance. For this purpose, six constituent laboratories of the Council of Scientific and Industrial Research (CSIR) in collaboration with other premier research institutes of India initiated a network program.
The Indian Genome Variation (IGV) consortium to identify and validate SNPs and polymorphic repeat sequences in thousands of genes of the human genome. These genes have been selected on the basis of their relevance as functional and positional candidates in many common diseases including genes relevant to pharmacogenomics.
A review on the planned study is published in Human Genetics (The Indian Genome Variation database IGVdb, 2005). This is the first large-scale comprehensive effort from India to understand and utilize the already present genome variations for their deployment in the drug industry.
The blood samples on which the DNA analysis is to be done are being collected from multiple indigenous tribes and populations of India. The data is expected to give an insight to the Indian population structure, its evolution with a far reaching implication in the study of common complex diseases and pharmacogenomics.