HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads

التفاصيل البيبلوغرافية
العنوان: HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
المؤلفون: Glennis A. Logsdon, Karen H. Miga, Brian P. Walenz, Robert Grothe, Sergey Koren, Arang Rhie, Evan E. Eichler, Mitchell R. Vollger, Adam M. Phillippy, Sergey Nurk
المصدر: Genome Res
بيانات النشر: Cold Spring Harbor Laboratory, 2020.
سنة النشر: 2020
مصطلحات موضوعية: Computer science, Sequence assembly, Method, Computational biology, Biology, DNA, Satellite, Genome, Cell Line, 03 medical and health sciences, 0302 clinical medicine, Chromosome Duplication, Genetics, Animals, Humans, Allele, Genetics (clinical), Alleles, 030304 developmental biology, Segmental duplication, 0303 health sciences, Contig, Genome, Human, Haplotype, Genetic Variation, High-Throughput Nucleotide Sequencing, Reproducibility of Results, DNA, Neoplasm, Sequence Analysis, DNA, Haplotypes, Human genome, Drosophila, Nanopore sequencing, Ploidy, 030217 neurology & neurosurgery, Software
الوصف: Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced PacBio HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a significant modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultra-long Oxford Nanopore reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of 9 complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance towards the complete assembly of human genomes.AvailabilityHiCanu is implemented within the Canu assembly framework and is available fromhttps://github.com/marbl/canu.
اللغة: English
DOI: 10.1101/2020.03.14.992248
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::30d0cca2ff7274e4f60f3ed21b9288edTest
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....30d0cca2ff7274e4f60f3ed21b9288ed
قاعدة البيانات: OpenAIRE