رسالة جامعية

Targeted analyses of very large genome-wide data collections

التفاصيل البيبلوغرافية
العنوان: Targeted analyses of very large genome-wide data collections
المؤلفون: Lee, Young-suk
المساهمون: Troyanskaya, Olga, Computer Science Department
بيانات النشر: Princeton University
سنة النشر: 2016
المجموعة: DataSpace at Princeton University
مصطلحات موضوعية: data integration, functional network, genome-wide data, human diseases, machine learning, ontology, Computer science, Bioinformatics, Molecular biology
الوصف: Genome-scale experiments provide an overwhelming amount of molecular information for biologist. New computational methods are needed for specific analysis and interpretation of such high-dimensional data. Here we take advantage of the massive public repositories to quantify the tissue-specific signals in gene expression profiles, characterize distinctive molecular features of human diseases, deconvolve the latent cell-type-specific factors in mixed clinical samples, and automatically integrate heterogeneous data sources in the context of a specific genome-wide dataset. First, we describe URSA (Unveiling RNA Sample Annotation) that incorporates the known tissue/cell-type relationships to better estimate the specific signal in any given gene expression profile. Our ontology-aware method combines independent discriminative classifiers in a Bayesian framework, outperforming other machine learning methods. We provide a molecular interpretation for the tissue and cell-type models learned by URSA, enabling a data-driven view of molecular processes specific to particular tissues and cell types. Then, we extend this work for human diseases. We use thousands of clinical disease-specific expression profiles in public repositories to quantify distinctive functional and anatomical characteristics of human diseases. Through our data-driven analysis, we explore the complexity of the human disease landscape and propose exploratory hypothesis for drug repurposing even for rare disease with no prior genetic knowledge. Lastly, we describe YETI (Your Evidence Tailored Integration) for targeted integration of heterogeneous genome-wide data sources. Biomedical researchers generate genome-wide datasets for data-driven exploration of specific questions but such analyses are disconnect from big public data collections. YETI is the first automatic integration method that effectively constructs functional networks specific to a genome-scale dataset. We show that the resulting integration reflect the biological context of the user-provided ...
نوع الوثيقة: doctoral or postdoctoral thesis
اللغة: English
العلاقة: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: http://catalog.princeton.eduTest/; http://arks.princeton.edu/ark:/88435/dsp018k71nk490Test
الإتاحة: http://arks.princeton.edu/ark:/88435/dsp018k71nk490Test
رقم الانضمام: edsbas.22702612
قاعدة البيانات: BASE