Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

التفاصيل البيبلوغرافية
العنوان: Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
المؤلفون: Barbara Robbertse, Patrick Masterson, Jinna Choi, Sanjida H. Rangwala, Vyacheslav Brover, Olga Blinkova, Stacy Ciufo, Kim D. Pruitt, Conrad L. Schoch, Daniel H. Haft, Kelly M. McGarvey, Richard McVeigh, Raymond E. Tully, Bhanu Rajput, Alexander Astashyn, Vinita Joardar, Wratko Hlavina, Vamsi K. Kodali, Kathleen O'Neill, Danso Ako-adjei, Hanzhen Sun, Craig Wallin, Mathew W. Wright, Michael DiCuccio, Daniel Rausch, Catherine M. Farrell, Susan S. Storz, Avi Kimchi, Terence Murphy, Tripti Gupta, Eneida L. Hatcher, Donna Maglott, Shashikant Pujar, Brian Smith-White, David Webb, Nuala A. O'Leary, Wenjun Li, Michael R. Murphy, Igor Tolstoy, Françoise Thibaud-Nissen, Diana Haddad, Olga Ermolaeva, Azat Badretdin, Andrei Shkeda, Lillian D. Riddick, Tatiana Tatusova, Wendy Wu, Melissa J. Landrum, Vyacheslav Chetvernin, Tamara Goldfarb, Anjana R. Vatsan, Paul Kitts, J. Rodney Brister, Yiming Bao, Eric Cox
المصدر: Nucleic Acids Research
بيانات النشر: Oxford University Press (OUP), 2015.
سنة النشر: 2015
مصطلحات موضوعية: 0301 basic medicine, Nematoda, Genomics, Genome, Viral, Biology, computer.software_genre, Mice, 03 medical and health sciences, Sequence Analysis, Protein, Databases, Genetic, Genetics, RefSeq, Animals, Humans, Database Issue, Phylogeny, Comparative genomics, Database, Genome, Human, Sequence Analysis, RNA, GENCODE, Gene Expression Profiling, Molecular Sequence Annotation, Genome project, Reference Standards, Invertebrates, Rats, Genome, Microbial, 030104 developmental biology, Vertebrates, Cattle, RNA, Long Noncoding, Human genome, Genome, Fungal, computer, Genome, Plant, Reference genome
الوصف: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseqTest/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55 000 organisms (>4800 viruses, >40 000 prokaryotes and >10 000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
تدمد: 1362-4962
0305-1048
DOI: 10.1093/nar/gkv1189
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7fa63f34080e34e61b5a99d6a1c2b83dTest
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....7fa63f34080e34e61b5a99d6a1c2b83d
قاعدة البيانات: OpenAIRE
الوصف
تدمد:13624962
03051048
DOI:10.1093/nar/gkv1189