دورية أكاديمية

DeepTSS: multi-branch convolutional neural network for transcription start site identification from CAGE data

التفاصيل البيبلوغرافية
العنوان: DeepTSS: multi-branch convolutional neural network for transcription start site identification from CAGE data
المؤلفون: Grigoriadis D., Perdikopanis N., Georgakilas G.K., Hatzigeorgiou A.G.
المصدر: BMC Bioinformatics ; https://www.scopus.com/inward/record.uri?eid=2-s2.0-85143992698&doi=10.1186%2fs12859-022-04945-y&partnerID=40&md5=ea8c15d758c3369a47bd8df3fa6a68d1Test
سنة النشر: 2022
المجموعة: University of Thessaly Institutional Repository / Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
مصطلحات موضوعية: Bioinformatics, Computational methods, Convolution, Convolutional neural networks, Deep learning, Learning systems, Molecular biology, Proteins, Signal processing, Signal to noise ratio, Cap analyse of gene expression, Gene Expression Data, Genes expression, Genomic-signal-processing, Machine-learning, Promoter, Protein-coding genes, Technical noise, Transcription start site, Transcription, chromatin, promoter region, software, transcription initiation site, Neural Networks, Computer, Promoter Regions, Genetic, BioMed Central Ltd
الوصف: Background: The widespread usage of Cap Analysis of Gene Expression (CAGE) has led to numerous breakthroughs in understanding the transcription mechanisms. Recent evidence in the literature, however, suggests that CAGE suffers from transcriptional and technical noise. Regardless of the sample quality, there is a significant number of CAGE peaks that are not associated with transcription initiation events. This type of signal is typically attributed to technical noise and more frequently to random five-prime capping or transcription bioproducts. Thus, the need for computational methods emerges, that can accurately increase the signal-to-noise ratio in CAGE data, resulting in error-free transcription start site (TSS) annotation and quantification of regulatory region usage. In this study, we present DeepTSS, a novel computational method for processing CAGE samples, that combines genomic signal processing (GSP), structural DNA features, evolutionary conservation evidence and raw DNA sequence with Deep Learning (DL) to provide single-nucleotide TSS predictions with unprecedented levels of performance. Results: To evaluate DeepTSS, we utilized experimental data, protein-coding gene annotations and computationally-derived genome segmentations by chromatin states. DeepTSS was found to outperform existing algorithms on all benchmarks, achieving 98% precision and 96% sensitivity (accuracy 95.4%) on the protein-coding gene strategy, with 96.66% of its positive predictions overlapping active chromatin, 98.27% and 92.04% co-localized with at least one transcription factor and H3K4me3 peak. Conclusions: CAGE is a key protocol in deciphering the language of transcription, however, as every experimental protocol, it suffers from biological and technical noise that can severely affect downstream analyses. DeepTSS is a novel DL-based method for effectively removing noisy CAGE signal. In contrast to existing software, DeepTSS does not require feature selection since the embedded convolutional layers can readily identify patterns ...
نوع الوثيقة: article in journal/newspaper
اللغة: English
تدمد: 14712105
العلاقة: http://hdl.handle.net/11615/73704Test
DOI: 10.1186/s12859-022-04945-y
الإتاحة: https://doi.org/10.1186/s12859-022-04945-yTest
http://hdl.handle.net/11615/73704Test
رقم الانضمام: edsbas.23AFDF35
قاعدة البيانات: BASE
الوصف
تدمد:14712105
DOI:10.1186/s12859-022-04945-y