TM prediction
Up one levelTransmembrane alpha-helix structure prediction algorithms we use
- TOPCONS - SCAMPI (position-specific amino acid contributions to the free energy of membrane insertion)
- TOPCONS: http://topcons.net von Heijne's web page: http://www.sbc.su.se/ Proc Natl Acad Sci U S A. 2008 May 20;105(20):7177-81. Prediction of membrane-protein topology from first principles. Bernsel A, Viklund H, Falk J, Lindahl E, von Heijne G, Elofsson A. Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden. The current best membrane-protein topology-prediction methods are typically based on sequence statistics and contain hundreds of parameters that are optimized on known topologies of membrane proteins. However, because the insertion of transmembrane helices into the membrane is the outcome of molecular interactions among protein, lipids and water, it should be possible to predict topology by methods based directly on physical data, as proposed >20 years ago by Kyte and Doolittle. Here, we present two simple topology-prediction methods using a recently published experimental scale of position-specific amino acid contributions to the free energy of membrane insertion that perform on a par with the current best statistics-based topology predictors. This result suggests that prediction of membrane-protein topology and structure directly from first principles is an attainable goal, given the recently improved understanding of peptide recognition by the translocon. PMID: 18477697
- TOPCONS - OCTOPUS algorithm (HMM & neural networks)
- Bioinformatics. 2008 Aug 1;24(15):1662-8. OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar. Viklund H, Elofsson A. Department of Biochemistry and Biophysics/Center for Biomembrane Research/Stockholm Bioinformatics Center, The Arrhenius Laboratories for Natural Sciences, Stockholm University, SE-10691 Stockholm, Sweden. MOTIVATION: As alpha-helical transmembrane proteins constitute roughly 25% of a typical genome and are vital parts of many essential biological processes, structural knowledge of these proteins is necessary for increasing our understanding of such processes. Because structural knowledge of transmembrane proteins is difficult to attain experimentally, improved methods for prediction of structural features of these proteins are important. RESULTS: OCTOPUS, a new method for predicting transmembrane protein topology is presented and benchmarked using a dataset of 124 sequences with known structures. Using a novel combination of hidden Markov models and artificial neural networks, OCTOPUS predicts the correct topology for 94% of the sequences. In particular, OCTOPUS is the first topology predictor to fully integrate modeling of reentrant/membrane-dipping regions and transmembrane hairpins in the topological grammar. AVAILABILITY: OCTOPUS is available as a web server at http://octopus.cbr.su.se. PMID: 18474507
- Phobius (HMM for TM and SP predictions)
- http://phobius.sbc.su.se/ Nucleic Acids Res. 2007 Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Käll L, Krogh A, Sonnhammer EL. Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden. lukall@u.washington.edu When using conventional transmembrane topology and signal peptide predictors, such as TMHMM and SignalP, there is a substantial overlap between these two types of predictions. Applying these methods to five complete proteomes, we found that 30-65% of all predicted signal peptides and 25-35% of all predicted transmembrane topologies overlap. This impairs predictions of 5-10% of the proteome, hence this is an important issue in protein annotation. To address this problem, we previously designed a hidden Markov model, Phobius, that combines transmembrane topology and signal peptide predictions. The method makes an optimal choice between transmembrane segments and signal peptides, and also allows constrained and homology-enriched predictions. We here present a web interface (http://phobius.cgb.ki.se and http://phobius.binf.ku.dk) to access Phobius. PMID: 17483518
- TMPro (HMM, NN & linear classifier algorithms)
- BMC Bioinformatics. 2008;9 Suppl 1:S4. Transmembrane helix prediction using amino acid property features and latent semantic analysis. Ganapathiraju M, Balakrishnan N, Reddy R, Klein-Seetharaman J. Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA. madhavi+@cs.cmu.edu BACKGROUND: Prediction of transmembrane (TM) helices by statistical methods suffers from lack of sufficient training data. Current best methods use hundreds or even thousands of free parameters in their models which are tuned to fit the little data available for training. Further, they are often restricted to the generally accepted topology "cytoplasmic-transmembrane-extracellular" and cannot adapt to membrane proteins that do not conform to this topology. Recent crystal structures of channel proteins have revealed novel architectures showing that the above topology may not be as universal as previously believed. Thus, there is a need for methods that can better predict TM helices even in novel topologies and families. RESULTS: Here, we describe a new method "TMpro" to predict TM helices with high accuracy. To avoid overfitting to existing topologies, we have collapsed cytoplasmic and extracellular labels to a single state, non-TM. TMpro is a binary classifier which predicts TM or non-TM using multiple amino acid properties (charge, polarity, aromaticity, size and electronic properties) as features. The features are extracted from sequence information by applying the framework used for latent semantic analysis of text documents and are input to neural networks that learn the distinction between TM and non-TM segments. The model uses only 25 free parameters. In benchmark analysis TMpro achieves 95% segment F-score corresponding to 50% reduction in error rate compared to the best methods not requiring an evolutionary profile of a protein to be known. Performance is also improved when applied to more recent and larger high resolution datasets PDBTM and MPtopo. TMpro predictions in membrane proteins with unusual or disputed TM structure (K+ channel, aquaporin and HIV envelope glycoprotein) are discussed. CONCLUSION: TMpro uses very few free parameters in modeling TM segments as opposed to the very large number of free parameters used in state-of-the-art membrane prediction methods, yet achieves very high segment accuracies. This is highly advantageous considering that high resolution transmembrane information is available only for very few proteins. The greatest impact of TMpro is therefore expected in the prediction of TM segments in proteins with novel topologies. Further, the paper introduces a novel method of extracting features from protein sequence, namely that of latent semantic analysis model. The success of this approach in the current context suggests that it can find potential applications in other sequence-based analysis problems. AVAILABILITY: http://linzer.blm.cs.cmu.edu/tmpro/ and http://flan.blm.cs.cmu.edu/tmpro/ PMID: 18315857 Bioinformatics. 2007 Oct 15;23(20):2795-6. TMpro web server and web service: transmembrane helix prediction through amino acid property analysis. Ganapathiraju M, Jursa CJ, Karimi HA, Klein-Seetharaman J. Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. TMpro is a transmembrane (TM) helix prediction algorithm that uses language processing methodology for TM segment identification. It is primarily based on the analysis of statistical distributions of properties of amino acids in transmembrane segments. This article describes the availability of TMpro on the internet via a web interface. The key features of the interface are: (i) output is generated in multiple formats including a user-interactive graphical chart which allows comparison of TMpro predicted segment locations with other labeled segments input by the user, such as predictions from other methods. (ii) Up to 5000 sequences can be submitted at a time for prediction. (iii) TMpro is available as a web server and is published as a web service so that the method can be accessed by users as well as other services depending on the need for data integration. Availability: http://linzer.blm.cs.cmu.edu/tmpro/ (web server and help), http://blm.sis.pitt.edu:8080/axis/services/TMProFetcherService (web service). PMID: 17724062