دورية أكاديمية

Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets

التفاصيل البيبلوغرافية
العنوان: Joint Representation Learning for Retrieval and Annotation of Genomic Interval Sets
المؤلفون: Erfaneh Gharavi, Nathan J. LeRoy, Guangtao Zheng, Aidong Zhang, Donald E. Brown, Nathan C. Sheffield
المصدر: Bioengineering, Vol 11, Iss 3, p 263 (2024)
بيانات النشر: MDPI AG, 2024.
سنة النشر: 2024
المجموعة: LCC:Technology
LCC:Biology (General)
مصطلحات موضوعية: genomic intervals, search, metadata, embeddings, representation learning, information retrieval, Technology, Biology (General), QH301-705.5
الوصف: As available genomic interval data increase in scale, we require fast systems to search them. A common approach is simple string matching to compare a search term to metadata, but this is limited by incomplete or inaccurate annotations. An alternative is to compare data directly through genomic region overlap analysis, but this approach leads to challenges like sparsity, high dimensionality, and computational expense. We require novel methods to quickly and flexibly query large, messy genomic interval databases. Here, we develop a genomic interval search system using representation learning. We train numerical embeddings for a collection of region sets simultaneously with their metadata labels, capturing similarity between region sets and their metadata in a low-dimensional space. Using these learned co-embeddings, we develop a system that solves three related information retrieval tasks using embedding distance computations: retrieving region sets related to a user query string, suggesting new labels for database region sets, and retrieving database region sets similar to a query region set. We evaluate these use cases and show that jointly learned representations of region sets and metadata are a promising approach for fast, flexible, and accurate genomic region information retrieval.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2306-5354
العلاقة: https://www.mdpi.com/2306-5354/11/3/263Test; https://doaj.org/toc/2306-5354Test
DOI: 10.3390/bioengineering11030263
الوصول الحر: https://doaj.org/article/b6fd5ffe0c914c6ca9308cf5c69f38e8Test
رقم الانضمام: edsdoj.b6fd5ffe0c914c6ca9308cf5c69f38e8
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:23065354
DOI:10.3390/bioengineering11030263