Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets

التفاصيل البيبلوغرافية
العنوان: Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets
المؤلفون: Matthew Greenway, Raymond Powell, David A. Hanley, Allison Heath, Rafael D. Suarez, Robert L. Grossman, Chai Bandlamudi, Kevin P. White, Megan E. McNerney, Jonathan Spring
المصدر: Journal of the American Medical Informatics Association : JAMIA
بيانات النشر: Oxford University Press (OUP), 2014.
سنة النشر: 2014
مصطلحات موضوعية: Computer science, Datasets as Topic, Health Informatics, Cloud computing, Clustered file system, Research and Applications, computer.software_genre, 03 medical and health sciences, Upload, 0302 clinical medicine, Computer Systems, Humans, biomedical clouds, 030304 developmental biology, Internet, 0303 health sciences, Database, business.industry, cloud computing, Genomics, Systems Integration, Phenotype, Virtual machine, 030220 oncology & carcinogenesis, Middleware (distributed applications), System integration, genomic clouds, Single sign-on, business, computer, Software
الوصف: Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics.
تدمد: 1527-974X
1067-5027
الوصول الحر: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4e04f50b5e3881b5c724a62d6d2b790dTest
https://doi.org/10.1136/amiajnl-2013-002155Test
حقوق: OPEN
رقم الانضمام: edsair.doi.dedup.....4e04f50b5e3881b5c724a62d6d2b790d
قاعدة البيانات: OpenAIRE