Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets
العنوان: | Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets |
---|---|
المؤلفون: | Matthew Greenway, Raymond Powell, David A. Hanley, Allison Heath, Rafael D. Suarez, Robert L. Grossman, Chai Bandlamudi, Kevin P. White, Megan E. McNerney, Jonathan Spring |
المصدر: | Journal of the American Medical Informatics Association : JAMIA |
بيانات النشر: | Oxford University Press (OUP), 2014. |
سنة النشر: | 2014 |
مصطلحات موضوعية: | Computer science, Datasets as Topic, Health Informatics, Cloud computing, Clustered file system, Research and Applications, computer.software_genre, 03 medical and health sciences, Upload, 0302 clinical medicine, Computer Systems, Humans, biomedical clouds, 030304 developmental biology, Internet, 0303 health sciences, Database, business.industry, cloud computing, Genomics, Systems Integration, Phenotype, Virtual machine, 030220 oncology & carcinogenesis, Middleware (distributed applications), System integration, genomic clouds, Single sign-on, business, computer, Software |
الوصف: | Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. |
تدمد: | 1527-974X 1067-5027 |
الوصول الحر: | https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4e04f50b5e3881b5c724a62d6d2b790dTest https://doi.org/10.1136/amiajnl-2013-002155Test |
حقوق: | OPEN |
رقم الانضمام: | edsair.doi.dedup.....4e04f50b5e3881b5c724a62d6d2b790d |
قاعدة البيانات: | OpenAIRE |
تدمد: | 1527974X 10675027 |
---|