دورية أكاديمية

The efficient phasing and imputation pipeline of low‐coverage whole genome sequencing data using a high‐quality and publicly available reference panel in cattle

التفاصيل البيبلوغرافية
العنوان: The efficient phasing and imputation pipeline of low‐coverage whole genome sequencing data using a high‐quality and publicly available reference panel in cattle
المؤلفون: Zhuangbiao Zhang, Ao Wang, Honghong Hu, Lulu Wang, Mian Gong, Qimeng Yang, Anguo Liu, Ran Li, Huanhuan Zhang, Qianqian Zhang, Ali Mujtaba Shah, Xihong Wang, Yachun Wang, Quanzhong Liu, Liutao Gao, Zhipeng Zhang, Congyong Wang, Yun Ma, Yudong Cai, Yu Jiang
المصدر: Animal Research and One Health, Vol 1, Iss 1, Pp 4-16 (2023)
بيانات النشر: Wiley, 2023.
سنة النشر: 2023
المجموعة: LCC:Animal culture
LCC:Animal biochemistry
مصطلحات موضوعية: cattle, imputation, lcWGS, phasing, reference panel, Animal culture, SF1-1100, Animal biochemistry, QP501-801
الوصف: Abstract Low‐coverage whole genome sequencing (lcWGS) has great potential to effectively genotype large‐scale population and to provide solid data for imputation; however, the time for imputation needs to be optimized. There is also no publicly available reference panel for whole genome selection in cattle. Here, we proposed a combination of Beagle v5.4 for phasing and GLIMPSE2 for imputation, which is fast and accurate for cattle lcWGS data. Furthermore, we established a multi‐breed reference panel with 61.8 million SNPs based on 2976 worldwide cattle, of which 1766 were bulls, by evaluating diversity and the size of the reference panel. The evaluation of imputation accuracy was conducted using new reference panel for both lcWGS and Bovine BeadChip data. The average concordance rate in Holstein was 99.6%, 99.6%, and 99.5% for 1X, 0.5X, and 0.1X lcWGS data, 99.5% and 99.0% for 777K and 50K chip data, and it was 98.8% for 1X lcWGS data in Simmental. We further investigated the factors affecting the imputation accuracy of lcWGS data and discovered that segmental duplication, structural variant, and guanine‐cytosine content were the top three factors. Interestingly, we found that 10 regions longer than 0.5 Mb showed low imputation accuracy enriched with immune function, such as 96.1% characterized genes in regions of chromosome 10, with more attention being paid on downstream immune‐related analysis. Our study provides the workflow of imputing lcWGS data and establishes the first high‐quality cattle reference panel with free access, which provides a resource to conduct subsequent large‐scale genome‐wide association studies and genomic selection.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2835-5075
العلاقة: https://doaj.org/toc/2835-5075Test
DOI: 10.1002/aro2.8
الوصول الحر: https://doaj.org/article/32fe922ab4dd487d98e3ea208dd09a53Test
رقم الانضمام: edsdoj.32fe922ab4dd487d98e3ea208dd09a53
قاعدة البيانات: Directory of Open Access Journals