Error-correcting DNA barcodes for high-throughput sequencing

التفاصيل البيبلوغرافية
العنوان: Error-correcting DNA barcodes for high-throughput sequencing
المؤلفون: Hawkins, John A., Jones Jr., Stephen K., Finkelstein, Ilya J., Press, William H.
بيانات النشر: Cold Spring Harbor Laboratory, 2018.
سنة النشر: 2018
مصطلحات موضوعية: 0303 health sciences, DNA synthesis, Computer science, Concatenation, Computational biology, Barcode, DNA sequencing, law.invention, 03 medical and health sciences, Identification (information), chemistry.chemical_compound, 0302 clinical medicine, chemistry, law, Dna barcodes, Hamming code, 030217 neurology & neurosurgery, GC-content, Decoding methods, DNA, 030304 developmental biology
الوصف: Many large-scale high-throughput experiments use DNA barcodes—short DNA sequences prepended to DNA libraries—for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely-used error-correcting codes borrowed from computer science (e.g., Hamming and Levenshtein codes) do not properly account for insertions and deletions in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate FREE (Filled/truncated Right End Edit) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced GC content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error-correction levels that may be useful in diverse high-throughput applications, including >106 single-error correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with > 1015 error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.SIGNIFICANCE STATEMENTModern high-throughput biological assays study pooled populations of individual members by labeling each member with a unique DNA sequence called a “barcode.” DNA barcodes are frequently corrupted by DNA synthesis and sequencing errors, leading to significant data loss and incorrect data interpretation. Here, we describe a novel error-correction strategy to improve the efficiency and statistical power of DNA barcodes. To our knowledge, this is the first report of an error-correcting method that accurately handles insertions and deletions in DNA barcodes, the most common type of error encountered during DNA synthesis and sequencing, resulting in order-of-magnitude increases in accuracy, efficiency, and signal-to-noise. The accompanying software package makes deployment of these barcodes effortless for the broader experimental scientist community.
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b28f22a6702bb3b165a5e867fbdad070Test
https://doi.org/10.1101/315002Test
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....b28f22a6702bb3b165a5e867fbdad070
قاعدة البيانات: OpenAIRE