دورية أكاديمية

GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis.

التفاصيل البيبلوغرافية
العنوان: GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis.
المؤلفون: Li, Qian1 (AUTHOR), Fisher, Kate2,3 (AUTHOR), Meng, Wenjun2 (AUTHOR), Fang, Bin4 (AUTHOR), Welsh, Eric2 (AUTHOR), Haura, Eric B5 (AUTHOR), Koomen, John M6 (AUTHOR), Eschrich, Steven A2 (AUTHOR), Fridley, Brooke L2 (AUTHOR), Chen, Y Ann2 (AUTHOR) ann.chen@moffitt.org
المصدر: Bioinformatics. 1/1/2020, Vol. 36 Issue 1, p257-263. 7p.
مصطلحات موضوعية: *LABELS, *STANDARD deviations, *MASS spectrometry, *SPECTRUM analysis, *RENAL cell carcinoma
مستخلص: Motivation Missingness in label-free mass spectrometry is inherent to the technology. A computational approach to recover missing values in metabolomics and proteomics datasets is important. Most existing methods are designed under a particular assumption, either missing at random or under the detection limit. If the missing pattern deviates from the assumption, it may lead to biased results. Hence, we investigate the missing patterns in free mass spectrometry data and develop an omnibus approach GMSimpute, to allow effective imputation accommodating different missing patterns. Results Three proteomics datasets and one metabolomics dataset indicate missing values could be a mixture of abundance-dependent and abundance-independent missingness. We assess the performance of GMSimpute using simulated data (with a wide range of 80 missing patterns) and metabolomics data from the Cancer Genome Atlas breast cancer and clear cell renal cell carcinoma studies. Using Pearson correlation and normalized root mean square errors between the true and imputed abundance, we compare its performance to K-nearest neighbors' type approaches, Random Forest, GSimp, a model-based method implemented in DanteR and minimum values. The results indicate GMSimpute provides higher accuracy in imputation and exhibits stable performance across different missing patterns. In addition, GMSimpute is able to identify the features in downstream differential expression analysis with high accuracy when applied to the Cancer Genome Atlas datasets. Availability and implementation GMSimpute is on CRAN: https://cran.r-project.org/web/packages/GMSimpute/index.htmlTest. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
قاعدة البيانات: Academic Search Index
الوصف
تدمد:13674803
DOI:10.1093/bioinformatics/btz488