CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes

التفاصيل البيبلوغرافية
العنوان: CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes
المؤلفون: Kuhl, Heiner, Li, Ling, Wuertz, Sven, Stöck, Matthias, Liang, Xu-Fang, Klopp, Christophe
المساهمون: Department of Ecophysiology and Aquaculture, Leibniz Institute of Freshwater Ecology and Inland Fisheries, Leibnitz-Leibnitz, College of Fisheries, Chinese Perch Research Center,Huazhong Agricultural University, Innovation Base for Chinese Perch Breeding, Key Lab of Freshwater AnimalBreeding, Ministry of Agriculture, Unité de Mathématiques et Informatique Appliquées de Toulouse (MIAT INRA), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Système d'Information des GENomes des Animaux d'Elevage (SIGENAE), German Research Foundation (DFG) - KU 3596/1-1324050651
المصدر: GigaScience
GigaScience, BioMed Central, 2020, 9 (5), ⟨10.1093/gigascience/giaa034⟩
GigaScience, 9(5):giaa034
بيانات النشر: HAL CCSD, 2020.
سنة النشر: 2020
مصطلحات موضوعية: chromosomes, AcademicSubjects/SCI02254, Computational Biology, High-Throughput Nucleotide Sequencing, Genomics, Sequence Analysis, DNA, genome scaffolding, long-read, comparative genomics, genome evolution, Synteny, SEQUENCE, EVOLUTION, INSIGHTS, SIZE, vertebrates, genome assembly, [SDE]Environmental Sciences, Technical Note, AcademicSubjects/SCI00960, Animals, CONSERVED SYNTENY, [INFO]Computer Science [cs], Software
الوصف: International audience; Background: Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. Results: Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. Conclusions: CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects.
اللغة: English
تدمد: 2047-217X
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=pmid_dedup__::e201e9297b7b02c504f5e09f20876957Test
https://hal.inrae.fr/hal-03182981Test
حقوق: OPEN
رقم الانضمام: edsair.pmid.dedup....e201e9297b7b02c504f5e09f20876957
قاعدة البيانات: OpenAIRE