The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets

التفاصيل البيبلوغرافية
العنوان: The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets
المؤلفون: A. Geoffrey Lyle, Jacquelyn M. Roger, Matthew A. Cattle, Katrina Learned, Robert Currie, Sofie R. Salama, Holly C. Beale, Lauren Sanders, John Vivian, Olena M. Vaske, Du Linh Lam, Ellen Kephart, Drew K A Thompson, Isabel Bjork, Jacob Pfeil, David Haussler, Liam T. McKay
المصدر: GigaScience, vol 10, iss 3
GigaScience
بيانات النشر: eScholarship, University of California, 2021.
سنة النشر: 2021
مصطلحات موضوعية: depth, AcademicSubjects/SCI02254, unmapped, Health Informatics, Computational biology, exonic, Biology, Deep sequencing, Whole Exome Sequencing, 03 medical and health sciences, 0302 clinical medicine, Neoplasms, Exome Sequencing, Technical Note, Genetics, Humans, RNA-Seq, Child, 030304 developmental biology, Cancer, 0303 health sciences, Sequence Analysis, RNA, Gene Expression Profiling, Human Genome, Reproducibility of Results, High-Throughput Nucleotide Sequencing, sequencing, Pediatric cancer, Computer Science Applications, duplicate, quality, AcademicSubjects/SCI00960, RNA, Sequence Analysis, 030217 neurology & neurosurgery, Biotechnology
الوصف: Background The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis. Findings In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1–77% of all reads (median [IQR], 3% [3–6%]); duplicate reads constitute 3–100% of mapped reads (median [IQR], 27% [13–43%]); and non-exonic reads constitute 4–97% of mapped, non-duplicate reads (median [IQR], 25% [16–37%]). MEND reads constitute 0–79% of total reads (median [IQR], 50% [30–61%]). Conclusions Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.
وصف الملف: application/pdf
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e0c41d1a09b6a7b588df765f9b7c0686Test
https://escholarship.org/uc/item/2fq331n9Test
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....e0c41d1a09b6a7b588df765f9b7c0686
قاعدة البيانات: OpenAIRE