دورية أكاديمية

Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants

التفاصيل البيبلوغرافية
العنوان: Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
المؤلفون: Jiadong Lin, Xiaofei Yang, Walter Kosters, Tun Xu, Yanyan Jia, Songbo Wang, Qihui Zhu, Mallory Ryan, Li Guo, Chengsheng Zhang, Charles Lee, Scott E. Devine, Evan E. Eichler, Kai Ye, Mark B. Gerstein, Ashley D. Sanders, Micheal C. Zody, Michael E. Talkowski, Ryan E. Mills, Jan O. Korbel, Tobias Marschall, Peter Ebert, Peter A. Audano, Bernardo Rodriguez-Martin, David Porubsky, Marc Jan Bonder, Arvis Sulovari, Jana Ebler, Weichen Zhou, Rebecca Serra Mari, Feyza Yilmaz, Xuefang Zhao, PingHsun Hsieh, Joyce Lee, Sushant Kumar, Tobias Rausch, Yu Chen, Zechen Chong, Katherine M. Munson, Mark J.P. Chaisson, Junjie Chen, Xinghua Shi, Aaron M. Wenger, William T. Harvey, Patrick Hansenfeld, Allison Regier, Ira M. Hall, Paul Flicek, Alex R. Hastie, Susan Fairely
المصدر: Genomics, Proteomics & Bioinformatics, Vol 20, Iss 1, Pp 205-218 (2022)
بيانات النشر: Elsevier, 2022.
سنة النشر: 2022
المجموعة: LCC:Biology (General)
مصطلحات موضوعية: Next-generation sequencing, Complex structural variant, Pattern growth, Graph mining, Formation mechanism, Biology (General), QH301-705.5
الوصف: Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/MakoTest.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1672-0229
العلاقة: http://www.sciencedirect.com/science/article/pii/S1672022921001431Test; https://doaj.org/toc/1672-0229Test
DOI: 10.1016/j.gpb.2021.03.007
الوصول الحر: https://doaj.org/article/513b1cf5af894e5190b4427f8d55b1a7Test
رقم الانضمام: edsdoj.513b1cf5af894e5190b4427f8d55b1a7
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:16720229
DOI:10.1016/j.gpb.2021.03.007