How do similarities in amino acids of their proteins provide an information about the evolutionary relationship of two different species?

Classification by Molecular Sequences

Organisms can also be classified based on their molecular sequences (molecular phylogeny). This is the branch of phylogeny that analyses genetic, hereditary molecular differences, predominately in DNA sequences, to gain information on an organism’s evolutionary relationships. This is also called cladistics.

How do similarities in amino acids of their proteins provide an information about the evolutionary relationship of two different species?

  • All living organisms on Earth are thought to have one common ancestor.
  • If two species have a similar set of proteins, chromosomes or DNA sequences, it provides evidence that they share a recent common ancestor.
  • Please note that ‘recent’ in evolutionary terms can be hundreds of thousands of years!
  • We can compare DNA sequences of different species and look at the similarities between the bases.
  • The greater the difference in DNA sequences, more time is presumed to have passed since they shared a common ancestor.

One protein that is commonly studied in attempting to determine the relatedness of species is cytochrome-c. This is a protein that is used in the electron transport chain of cellular respiration. It has changed very little over millions of years of evolution so the more similarity there is between the cytochrome-c from different species, the more recently the species have evolved from a common ancestor. The table below shows the molecular homology of cytochrome-c between different species.

HumanGlnProTyrSerThrAlaLysAsnLysIleGlyGluAspThrLeuMetGluLysAlaThrAsnGlu
ChickenGlnGluPheSerThrAspLysAsnLysThrGlyGluAspThrLeuMetGluLysAlaThrSerLys
HorseGlnPropheThrThrAlaLysAsnLysThrLysGluGluThrLeuMetGluLysAlaThrAsnGlu
FrogGlnAlaPheSerThrAspLysAsnLysThrGlyGlyAspThrLeuMetGluSerAlaCysSerLys
SharkGlnGlnPheSerThrAspLysSerLysThrGlnGlnGluThrLeuArgIleLysThrAlaAlaSer
MonkeyGlnProTyrSerThrAlaLysAsnLysThrGlyGluAspThrLeuMetGluLysAlaThrAsnGlu
RabbitGlnValPheSerThrAspLysAsnLysThrGlyGluAspThrLeuMetGluLysAlaThrAsnTh

Using the table, organise the species from most closely to least related to humans.

1. Zhang J, Yang JR. Determinants of the rate of protein sequence evolution. Nat Rev Genet. 2015;16:409–420. doi: 10.1038/nrg3950. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

2. Pál C, Papp B, Lercher MJ. An integrated view of protein evolution. Nature reviews. Genetics. 2006;7:337. doi: 10.1038/nrg1838. [PubMed] [CrossRef] [Google Scholar]

3. Kosiol C, Holmes I, Goldman N. An empirical codon model for protein sequence evolution. Mol Biol Evol. 2007;24:1464–1479. doi: 10.1093/molbev/msm064. [PubMed] [CrossRef] [Google Scholar]

4. Hurst LD. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 2002;18:486. doi: 10.1016/S0168-9525(02)02722-1. [PubMed] [CrossRef] [Google Scholar]

5. Nei, M. & Kumar, S. Molecular evolution and phylogenetics. (Oxford university press, 2000).

6. Yang Z, Wong WS, Nielsen R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. 2005;22:1107–1118. doi: 10.1093/molbev/msi097. [PubMed] [CrossRef] [Google Scholar]

7. Jordan IK, et al. A universal trend of amino acid gain and loss in protein evolution. Nature. 2005;433:633–638. doi: 10.1038/nature03306. [PubMed] [CrossRef] [Google Scholar]

8. Seligmann H. Cost-minimization of amino acid usage. J Mol Evol. 2003;56:151–161. doi: 10.1007/s00239-002-2388-z. [PubMed] [CrossRef] [Google Scholar]

9. Akashi H, Gojobori T. Metabolic efficiency and amino acid composition in the proteome of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci USA. 2002;99:3695–3700. doi: 10.1073/pnas.062526999. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

10. Das S, Paul S, Chatterjee S, Dutta C. Codon and Amino Acid Usage in Two Major Human Pathogens of Genus Bartonella — Optimization Between Replicational-Transcriptional Selection, Translational Control and Cost Minimization. Dna Research An International Journal for Rapid Publication of Reports on Genes & Genomes. 2005;12:91. doi: 10.1093/dnares/12.2.91. [PubMed] [CrossRef] [Google Scholar]

11. Graur D. Amino acid composition and the evolutionary rates of protein-coding genes. J Mol Evol. 1985;22:53–62. doi: 10.1007/BF02105805. [PubMed] [CrossRef] [Google Scholar]

12. Tourasse NJ, Li WH. Selective constraints, amino acid composition, and the rate of protein evolution. Mol Biol Evol. 2000;17:656–664. doi: 10.1093/oxfordjournals.molbev.a026344. [PubMed] [CrossRef] [Google Scholar]

13. Xia Y, Franzosa EA, Gerstein MB. Integrated assessment of genomic correlates of protein evolutionary rate. PLoS Comput Biol. 2009;5:e1000413. doi: 10.1371/journal.pcbi.1000413. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

14. Liu H, Xie Z, Tan S, Zhang X, Yang S. Relationship between amino acid usage and amino acid evolution in primates. Gene. 2015;557:182–187. doi: 10.1016/j.gene.2014.12.033. [PubMed] [CrossRef] [Google Scholar]

15. Mugal CF, Wolf JB, Kaj I. Why time matters: codon evolution and the temporal dynamics of dN/dS. Mol Biol Evol. 2014;31:212–231. doi: 10.1093/molbev/mst192. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

16. Benson DA, et al. GenBank. Nucleic Acids Research. 2017;45:D37–D42. doi: 10.1093/nar/gkw1070. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

17. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [PubMed] [CrossRef] [Google Scholar]

18. McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nuleic Acids Res. 2004;32:W20–W25. doi: 10.1093/nar/gkh435. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

19. Larkin MA, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–2948. doi: 10.1093/bioinformatics/btm404. [PubMed] [CrossRef] [Google Scholar]

20. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Bioinformatics. 1997;13:555–556. doi: 10.1093/bioinformatics/13.5.555. [PubMed] [CrossRef] [Google Scholar]

21. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [PubMed] [CrossRef] [Google Scholar]

22. Duret L, Mouchiroud D. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA. 1999;96:4482–4487. doi: 10.1073/pnas.96.8.4482. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

23. Qin WY, et al. New insights into the codon usage patterns of the bactericidal/permeability-increasing (BPI) gene across nine species. Gene. 2017;616:45–51. doi: 10.1016/j.gene.2017.03.016. [PubMed] [CrossRef] [Google Scholar]

24. Stenico M, Lloyd AT, Sharp PM. Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res. 1994;22:2437–2446. doi: 10.1093/nar/22.13.2437. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

25. Wang M, Herrmann CJ, Simonovic M, Szklarczyk D, Mering C. Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell‐lines. Proteomics. 2015;15:3163–3168. doi: 10.1002/pmic.201400441. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

26. Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of computational and graphical statistics. 1996;5:299–314. [Google Scholar]

27. Cule, E. & De Iorio, M. A semi-automatic method to guide the choice of ridge parameter in ridge regression. arXiv preprint arXiv:1205.0686 (2012).

28. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 2000;42:80–86. doi: 10.1080/00401706.2000.10485983. [CrossRef] [Google Scholar]

29. Park C, Chen X, Yang JR, Zhang J. Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA. 2013;110:E678–686. doi: 10.1073/pnas.1218066110. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

30. Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA. 2005;102:14338–14343. doi: 10.1073/pnas.0504070102. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

31. Ran W, Kristensen DM, Koonin EV. Coupling Between Protein Level Selection and Codon Usage Optimization in the Evolution of Bacteria and Archaea. Mbio. 2014;5:00956–00914. doi: 10.1128/mBio.00956-14. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

32. Lassalle F, et al. GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet. 2015;11:e1004941. doi: 10.1371/journal.pgen.1004941. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

33. Khrustalev VV, Khrustaleva TA, Sharma N, Giri R. Mutational Pressure in Zika Virus: Local ADAR-Editing Areas Associated with Pauses in Translation and Replication. Frontiers in Cellular & Infection Microbiology. 2017;7:44. doi: 10.3389/fcimb.2017.00044. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

34. Bohlin J, Brynildsrud O, Vesth T, Skjerve E, Ussery DW. Amino acid usage is asymmetrically biased in AT- and GC-Rich microbial genomes. Plos One. 2013;8:e69878. doi: 10.1371/journal.pone.0069878. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

35. Zhou HQ, Ning LW, Zhang HX, Guo FB. Analysis of the Relationship between Genomic GC Content and Patterns of Base Usage, Codon Usage and Amino Acid Usage in Prokaryotes: Similar GC Content Adopts Similar Compositional Frequencies Regardless of the Phylogenetic Lineages. Plos One. 2014;9:e107319. doi: 10.1371/journal.pone.0107319. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

36. Dreyer HC, et al. Leucine-enriched essential amino acid and carbohydrate ingestion following resistance exercise enhances mTOR signaling and protein synthesis in human muscle. American journal of physiology. Endocrinology and metabolism. 2008;294:E392–E400. doi: 10.1152/ajpendo.00582.2007. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

37. Harlan DM, et al. The human myristoylated alanine-rich C kinase substrate (MARCKS) gene (MACS). Analysis of its gene product, promoter, and chromosomal localization. Journal of Biological Chemistry. 1991;266:14399–14405. [PubMed] [Google Scholar]

38. Dias LM, et al. Genomic Architecture of the Two Cold-Adapted Genera Exiguobacterium and Psychrobacter: Evidence of Functional Reduction in the Exiguobacterium antarcticum B7 Genome. Genome Biology and Evolution. 2018;10:731–741. doi: 10.1093/gbe/evy029. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

39. Lynch M, et al. Genetic drift, selection and the evolution of the mutation rate. Nature Reviews Genetics. 2016;17:704–714. doi: 10.1038/nrg.2016.104. [PubMed] [CrossRef] [Google Scholar]

40. Bashir T, et al. Hybridization Alters Spontaneous Mutation Rates in a Parent-of-Origin-Dependent Fashion in Arabidopsis. Plant Physiology. 2014;165:424–437. doi: 10.1104/pp.114.238451. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

41. Chen WH, Lu G, Bork P, Hu S, Lercher MJ. Energy efficiency trade-offs drive nucleotide usage in transcribed regions. Nature Communications. 2016;7:11334. doi: 10.1038/ncomms11334. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

42. Akashi H, Gojobori T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:3695–3700. doi: 10.1073/pnas.062526999. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

43. Wei T, et al. LRRML: a conformational database and an XML description of leucine-rich repeats (LRRs) BMC Structural Biology. 2008;8:47–47. doi: 10.1186/1472-6807-8-47. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

44. Kimura Y, et al. 3-Mercaptopyruvate sulfurtransferase produces potential redox regulators cysteine- and glutathione-persulfide (Cys-SSH and GSSH) together with signaling molecules H(2)S(2), H(2)S(3) and H(2)S. Scientific Reports. 2017;7:10459. doi: 10.1038/s41598-017-11004-7. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

45. Mills LJ, Pearson WR. Adjusting scoring matrices to correct overextended alignments. Bioinformatics. 2013;29:3007–3013. doi: 10.1093/bioinformatics/btt517. [PMC free article] [PubMed] [CrossRef] [Google Scholar]


Page 2

PMC full text:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

How do similarities in amino acids of their proteins provide an information about the evolutionary relationship of two different species?

Multivariate linear regression models between amino acids and evolutionary rates. (A) There are 273 genome pairs belong to 18 phyla. Corresponding genome pair count and the average R2 for the multivariate linear regression between amino acid compositions and evolutionary rates were shown. (B) For 273 organisms, the total decision coefficient R2 ranged in 0~0.6 with P is less than 0.05. (C) GC content influences the total decision coefficient R2 for the multivariate linear regression between amino acid compositions and evolutionary rates. (D) Genome size negatively correlates with the total decision coefficient R2. (E) The evolutionary rates for proteins in the five model organisms and corresponding average are: 0.26, 0.11,0.13,0.16, and 0.15.

  • How do similarities in amino acids of their proteins provide an information about the evolutionary relationship of two different species?
  • How do similarities in amino acids of their proteins provide an information about the evolutionary relationship of two different species?
  • How do similarities in amino acids of their proteins provide an information about the evolutionary relationship of two different species?
  • How do similarities in amino acids of their proteins provide an information about the evolutionary relationship of two different species?

Click on the image to see a larger version.