دورية أكاديمية

cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries

التفاصيل البيبلوغرافية
العنوان: cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries
المؤلفون: Meifang Qi, Utthara Nayar, Leif S. Ludwig, Nikhil Wagle, Esther Rheinbay
المصدر: BMC Bioinformatics, Vol 22, Iss 1, Pp 1-14 (2021)
بيانات النشر: BMC, 2021.
سنة النشر: 2021
المجموعة: LCC:Computer applications to medicine. Medical informatics
LCC:Biology (General)
مصطلحات موضوعية: Contamination, Genomics, Software, Quality control, cDNA, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
الوصف: Abstract Background Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines. Results We present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We demonstrate that cDNA-detector can identify cDNAs quickly and accurately from alignment files. A source inference step attempts to separate endogenous cDNAs (retrocopied genes) from potential cloned, exogenous cDNAs. cDNA-detector provides a mechanism to decontaminate the alignment from detected cDNAs. Simulation studies show that cDNA-detector is highly sensitive and specific, outperforming existing tools. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls. Conclusions cDNA-detector is a user-friendly and accurate tool to detect and remove cDNA detection in NGS libraries. This two-step design reduces the risk of true variant removal since it allows for manual review of candidates. We find that contamination with intentionally and accidentally introduced cDNAs is an underappreciated problem even in widely-used consortium datasets, where it can lead to spurious results. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1471-2105
العلاقة: https://doaj.org/toc/1471-2105Test
DOI: 10.1186/s12859-021-04529-2
الوصول الحر: https://doaj.org/article/e63abeda4c1d400b9d4b0c88a680ac57Test
رقم الانضمام: edsdoj.63abeda4c1d400b9d4b0c88a680ac57
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:14712105
DOI:10.1186/s12859-021-04529-2