دورية أكاديمية
Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone
العنوان: | Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone |
---|---|
المؤلفون: | Ruiz-Blanco Y.B., Agüero-Chapin G., García-Hernández E., Álvarez O., Antunes A., Green J. |
المساهمون: | CIIMAR - Centro Interdisciplinar de Investigação Marinha e Ambiental |
بيانات النشر: | BMC |
سنة النشر: | 2017 |
المجموعة: | Repositório Aberto da Universidade do Porto |
مصطلحات موضوعية: | Alignment, Bacteria, Benchmarking, Classification (of information), Encoding (symbols), Enzymes, Support vector machines, Topology, Antibacterial peptides, Computational predictions, Descriptors, Post-translational modifications, ProtDCal, Protein analysis, Sequence based features, TI2BioP, Proteins |
الوصف: | Background: Computational prediction of protein function constitutes one of the more complex problems in Bioinformatics, because of the diversity of functions and mechanisms in that proteins exert in nature. This issue is reinforced especially for proteins that share very low primary or tertiary structure similarity to existing annotated proteomes. In this sense, new alignment-free (AF) tools are needed to overcome the inherent limitations of classic alignment-based approaches to this issue. We have recently introduced AF protein-numerical-encoding programs (TI2BioP and ProtDCal), whose sequence-based features have been successfully applied to detect remote protein homologs, post-translational modifications and antibacterial peptides. Here we aim to demonstrate the applicability of 4 AF protein descriptor families, implemented in our programs, for the identification enzyme-like proteins. At the same time, the use of our novel family of 3D-structure-based descriptors is introduced for the first time. The Dobson & Doig (D&D) benchmark dataset is used for the evaluation of our AF protein descriptors, because of its proven structural diversity that permits one to emulate an experiment within the twilight zone of alignment-based methods (pair-wise identity <30%). The performance of our sequence-based predictor was further assessed using a subset of formerly uncharacterized proteins which currently represent a benchmark annotation dataset. Results: Four protein descriptor families (sequence-composition-based (0D), linear-topology-based (1D), pseudo-fold-topology-based (2D) and 3D-structure features (3D), were assessed using the D&D benchmark dataset. We show that only the families of ProtDCal's descriptors (0D, 1D and 3D) encode significant information for enzymes and non-enzymes discrimination. The obtained 3D-structure-based classifier ranked first among several other SVM-based methods assessed in this dataset. Furthermore, the model leveraging 1D descriptors, showed a higher success rate than EzyPred ... |
نوع الوثيقة: | article in journal/newspaper |
وصف الملف: | application/pdf |
اللغة: | English |
تدمد: | 14712105 |
العلاقة: | info:eu-repo/grantAgreement/FCT/5876/147268/PT; BMC Bioinformatics, vol. 18(1):349; http://dx.doi.org/10.1186/s12859-017-1758-xTest; https://hdl.handle.net/10216/120519Test |
DOI: | 10.1186/s12859-017-1758-x |
الإتاحة: | https://doi.org/10.1186/s12859-017-1758-xTest https://hdl.handle.net/10216/120519Test |
حقوق: | info:eu-repo/semantics/openAccess |
رقم الانضمام: | edsbas.FB1148E1 |
قاعدة البيانات: | BASE |
تدمد: | 14712105 |
---|---|
DOI: | 10.1186/s12859-017-1758-x |