Biologia plantarum 2016, 60:619-627 | DOI: 10.1007/s10535-016-0649-8

A plant biologists' guide to phylogenetic analysis of biological macromolecule sequences

F. Cvrčková1,*
1 Department of Experimental Plant Biology, Faculty of Sciences, Charles University, Prague, Czech Republic

Phylogenetic analysis has become a common step in characterization of gene and protein sequences. However, despite the availability of numerous affordable and more-or-less intuitive software tools, construction of biologically relevant, informative phylogenetic trees remains a process involving several critical steps that are inherently non-algorithmic, i.e., dependent on decisions made by the user. These steps involve, but are not limited to, setting the aims of the phylogenetic study, choosing sequences to be analyzed, and selecting methods employed in sequence alignment construction, as well as algorithms and parameters used to construct the actual phylogenetic tree. This review aims towards providing guidance for these decisions, as well as illustrating common pitfalls and problems occurring during phylogenetic analysis of plant gene sequences.

Keywords: bioinformatics; evolution; phylogenetic tree; protein domain identification; sequence alignment; sequence database searching
Subjects: phylogenetic analysis; bioinformatics; protein domain identification; amino acid sequences; database searching

Received: December 9, 2015; Revised: March 10, 2016; Accepted: April 12, 2016; Published: December 1, 2016Show citation

ACS AIP APA ASA Harvard Chicago IEEE ISO690 MLA NLM Turabian Vancouver
Cvrčková, F. (2016). A plant biologists' guide to phylogenetic analysis of biological macromolecule sequences. Biologia plantarum60(4), 619-627. doi: 10.1007/s10535-016-0649-8.
Download citation

References

  1. Al Ait, L., Yamak, Z., Morgenstern, B.: DIALIGN at GOBICS-multiple sequence alignment using various sources of external information. - Nucl. Acids Res. 41: W3-W7, 2013. Go to original source...
  2. Baldauf, S.L.: Phylogeny for the faint of heart: a tutorial. - Trends Genet. 19: 345-351, 2003. Go to original source...
  3. Bateman, A., The uniprot consortium: UniProt: a hub for protein information. - Nucl. Acids Res. 43: D204-D212, 2015.
  4. Baum, D.: Reading a phylogenetic tree: the meaning of monophyletic groups. - Natur. Edu. 1: 190, 2008.
  5. Blouin, C., Perry, S., Lavell, A., Susko, E., Roger, A.J.: Reproducing the manual annotation of multiple sequence alignments using a SVM classifier. - Bioinformatics 25: 3093-3098, 2009. Go to original source...
  6. Boc, A., Diallo, A.B., Makarenkov, V.: T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. - Nucl. Acids Res. 40: W573-W579, 2012. Go to original source...
  7. Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldon, T.: trimAl: a tool for automated alignment trimming in largescale phylogenetic analyses. - Bioinformatics 25: 1972-1973, 2009.
  8. Chothia, C., Lesk, A.M.: The relation between the divergence of sequence and structure in proteins. - EMBO J. 5: 823-826, 1986. Go to original source...
  9. Cochrane, G., Karsch-Mizrachi, I., Nakamura, Y.: The international nucleotide sequence database collaboration. - Nucl. Acids Res. 39: D15-D18, 2011. Go to original source...
  10. Criscuolo, A., Gribaldo, S.: BMGE (block mapping and gathering with entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. - BMC Evol. Biol. 10: 210, 2010. Go to original source...
  11. Cvrčková, F., Grunt, M., Bezvoda, R., Hála, M., Kulich, I., Rawat, A., Žárský, V.: Evolution of the land plant exocyst complexes. - Front. Plant Sci. 3: 159, 2012. Go to original source...
  12. Cvrčková, F., Pícková, D., Novotný, M., Žárský, V.: Formin homology 2 domains occur in multiple contexts in angiosperms. - BMC Genomics 5: 44, 2004. Go to original source...
  13. De Castro E., Sigrist, C.J.A., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P.S., Gasteiger, E., Bairoch, A., Hulo, N.: ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. - Nucl. Acids Res. 34: W362-365, 2006. Go to original source...
  14. Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., Claverie, J.M., Gascuel, O.: Phylogeny.fr: robust phylogenetic analysis for the non-specialist. - Nucl. Acids Res. 36: W465-W469, 2008. Go to original source...
  15. Douady, C.J., Delsuc, F., Boucher, Y., Doolittle, W.F., Douzery, E.J.: Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. - Mol. Biol. Evol. 20: 248-254, 2003. Go to original source...
  16. Dvořáková, L., Cvrčková, F., Fischer, L.: Analysis of the hybrid proline-rich protein families from seven plant species suggests rapid diversification of their sequences and expression patterns. - BMC Genomics 8: 412, 2007.
  17. Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. - Nucl. Acids Res. 32: 1792-1797, 2004. Go to original source...
  18. Egli, B., Kölling, K., Köhler, C., Zeeman, S.C., Streb, S.: Loss of cytosolic phosphoglucomutase compromises gametophyte development in Arabidopsis. - Plant Physiol. 154: 1659-1671, 2010. Go to original source...
  19. Eliáš, M., Potocký, M., Cvrčková, F. Žárský, V.: Molecular diversity of phospholipase D in angiosperms. - BMC Genomics 3: 2, 2002. Go to original source...
  20. Felsenstein, J.: PHYLIP - phylogeny inference package (version 3.2). - Cladistics 5: 164-166, 1989.
  21. Fernandez-Pozo, N., Menda, N., Edwards, J.D., Saha, S., Tecle, I.Y., Strickler, S.R., Bombarely, A., Fisher-York, T., Pujar, A., Foerster, H., Yan, A., Mueller, L.A.: The sol genomics network (SGN)-from genotype to phenotype to breeding. - Nucl. Acids Res. 43: D1036-D1041, 2015. Go to original source...
  22. Gish, L.A., Clark. S.E.: The RLK/Pelle family of kinases. - Plant J. 66: 117-127, 2011. Go to original source...
  23. Goldman N.: Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. - System. Biol. 39: 345-361, 1990. Go to original source...
  24. Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., Rokhsar, D.S.: Phytozome: a comparative platform for green plant genomics. - Nucl. Acids Res. 40: D1178-D186, 2012. Go to original source...
  25. Grunt, M., Žárský, V., Cvrčková, F.: Roots of angiosperm formins: the evolutionary history of plant FH2 domaincontaining proteins. - BMC Evol. Biol. 8: 115, 2008. Go to original source...
  26. Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O.: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. - System. Biol. 59: 307-321, 2010. Go to original source...
  27. Hall, B.G.: Building phylogenetic trees from molecular data with MEGA. - Mol. Biol. Evol. 30: 1229-1235, 2013. Go to original source...
  28. Hall, T.: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. - Nucl. Acids Symp. Ser. 41: 95-98, 1999.
  29. Harrison, C.J., Langdale, J.: A step by step guide to phylogeny reconstruction. - Plant J. 45: 561-572, 2006. Go to original source...
  30. Higgins, D.G, Sharp, P.M.: CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. - Gene 73: 237-244, 1988. Go to original source...
  31. Holder, M., Lewis, P.O.: Phylogeny estimation: traditional and Bayesian approaches. - Natur. Rev. Genet. 4: 275-284, 2003. Go to original source...
  32. Howe, C.J., Windram, H.F.: Phylomemetics-evolutionary analysis beyond the gene. - PLoS Biol. 9: e1001069, 2011. Go to original source...
  33. Huelsenbeck, J.P., Larget, B., Miller, R.E., Ronquist, F.: Potential applications and pitfalls of Bayesian inference of phylogeny. - System. Biol. 51: 673-688, 2002. Go to original source...
  34. Jiao, Y., Paterson, A.H.: Polyploidy-associated genome modifications during land plant evolution. - Phil. Trans. Roy. Soc. London B Biol. Sci. 369: 20130355, 2014. Go to original source...
  35. Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L.: NCBI BLAST: a better web interface. - Nucl. Acids Res. 36: W5-W9, 2008. Go to original source...
  36. Katoh, K., Standley, C.M.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. - Mol. Biol. Evol. 30: 772-780, 2013. Go to original source...
  37. Kuraku, S., Feiner, N., Keeley, S.D., Hara, Y.: Incorporating tree-thinking and evolutionary time scale into developmental biology. - Dev. Growth Differentiation 58: 131-142, 2016. Go to original source...
  38. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G.: Clustal W and Clustal X version 2.0. - Bioinformatics 23: 2947-2948, 2007. Go to original source...
  39. Lassmann, T., Frings, O., Sonnhammer, E.L.L.: Kalign2: highperformance multiple alignment of protein and nucleotide sequences allowing external features. - Nucl. Acids Res. 37: 858-865, 2009. Go to original source...
  40. Letunic, I., Doerks, T., Bork, P.: SMART: recent updates, new developments and status in 2015. - Nucl. Acids Res. 43: D257-D260, 2015. Go to original source...
  41. Marchler-Bauer, A., Bryant, S.H: CD-Search: protein domain annotations on the fly. - Nucl. Acids Res. 32: W327-W331, 2004. Go to original source...
  42. Marchler-Bauer, A., Derbyshire, M.K., Gonzales, N.R., Lu, S., Chitsaz, F., Geer, L.Y., Geer, R.C., He, J., Gwadz, M., Hurwitz, D.I., Lanczycki, C.J., Lu, F., Marchler, G.H., Song, J.S., Thanki, N., Wang, Z., Yamashita, R.A., Zhang, D., Zheng, C., Bryant, S.H.: CDD: NCBI's conserved domain database. - Nucl. Acids Res. 43: D222-D226, 2015. Go to original source...
  43. McGinnis, S., Madden, T.L.: BLAST: at the core of a powerful and diverse set of sequence analysis tools. - Nucl. Acids Res. 32: W20-W25, 2004. Go to original source...
  44. Monaco, M.K., Stein, J., Naithani, S., Wei, S., Dharmawardhana, P., Kumari, S., Amarasinghe, V., Youens-Clark, K., Thomason, J., Preece, J., Pasternak, S., Olson, A., Jiao, Y., Lu, Z., Bolser, D., Kerhornou, A., Staines, D., Walts, B., Wu, G., D'Eustachio, P., Haw, R., Croft, D., Kersey, P.J., Stein, L., Jaiswal, P., Ware, D.: Gramene 2013: comparative plant genomics resources. - Nucl. Acids Res. 42: D1193-D1199, 2014. Go to original source...
  45. Moretti, S., Armougom, F., Wallace, I.M., Higgins, D.G., Jongeneel, C.V., Notredame, C.: The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. - Nucl. Acids Res. 35: W645-W648, 2007. Go to original source...
  46. Mühlbach H, Schnarrenberger C.: Properties and intracellular distribution of two phosphoglucomutases from spinach leaves. - Planta 141: 65-70, 1978. Go to original source...
  47. Notredame. C., Higgins, D.G., Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. - J. mol. Biol. 302: 205-217, 2000. Go to original source...
  48. O'Halloran, D.: A practical guide to phylogenetics for nonexperts. - J. visual Exp. 84: e50975, 2014. Go to original source...
  49. Pais, F.S.M., Ruy, P.C., Oliveira, G., Coimbra, R.S.: Assessing the efficiency of multiple sequence alignment programs. - Algorithms mol. Biol. 9: 4, 2014. Go to original source...
  50. Papadopoulos, J.S., Agarwala, R.: COBALT: constraint-based alignment tool for multiple protein sequences. - Bioinformatics 23: 1073-1079, 2007. Go to original source...
  51. Pible, O., Armengaud, J.: Improving the quality of genome, protein sequence, and taxonomy databases: a prerequisite for microbiome meta-omics 2.0. - Proteomics 15: 3418-3423, 2015. Go to original source...
  52. Rannala, B., Yang, Z.: Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. - J. mol. Evol. 43: 304-311, 1996. Go to original source...
  53. Rieppel, O.: The series, the network, and the tree: changing metaphors of order in nature. - Biol. Phil. 25: 475-496, 2010. Go to original source...
  54. Sánchez, R., Serra, F., Tárraga, J., Medina, I., Carbonell, J., Pulido, L., de María, A., Capella-Gutíerrez, S., Huerta-Cepas, J., Gabaldón, T., Dopazo, J., Dopazo, H.: Phylemon 2.0: a suite of web-tools for molecular evolution, phylogenetics, phylogenomics and hypotheses testing. - Nucl. Acids Res. 39: W470-W474. 2011. Go to original source...
  55. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees.-Mol. Biol. Evol. 4: 406-425, 1987.
  56. Schuler, G.D., Altschul, S.F., Lipman, D.J.: A workbench for multiple alignment construction and analysis. - Proteins 9: 180-190, 1991 Go to original source...
  57. Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson, A.H., Zheng, C., Sankoff, D., de Pamphilis, C.W., Wall, P.K., Soltis, P.S.: Polyploidy and angiosperm diversification. - Amer. J. Bot. 96: 336-348, 2009. Go to original source...
  58. Talavera, G., Castresana, J.: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. - System. Biol. 56: 564-577, 2007. Go to original source...
  59. Tamura, K., Stecher, G., Peterson, D., Filipski, A., Kumar, S.: MEGA6: molecular evolutionary genetics analysis version 6.0. - Mol. Biol. Evol. 30: 2725-2729, 2013. Go to original source...
  60. Wilgenbusch, J.C., Swofford, D.: Inferring evolutionary trees with PAUP*. - Current Protocols Bioinformatics 6: Unit 6.4, 2003. Go to original source...
  61. Yuksel, B., Memon, A.R.: Comparative phylogenetic analysis of small GTP-binding genes of model legume plants and assessment of their roles in root nodules. - J. exp. Bot. 59: 3831-3844, 2008. Go to original source...
  62. Zhang, X.C., Wang, Z., Zhang, X., Le, M.H., Sun, J., Xu, D., Cheng, J., Stacey, G.: Evolutionary dynamics of protein domain architecture in plants. - BMC Evol. Biol. 12: 6, 2012. Go to original source...
  63. Żmieńko, A., Samelak, A., Kozłowski, P., Figlerowicz, M.: Copy number polymorphism in plant genomes. - Theor. appl. Genet. 127: 1-18, 2014. Go to original source...