دورية أكاديمية
A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites
العنوان: | A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites |
---|---|
المؤلفون: | Zeng, Yan |
المصدر: | Masters Theses |
بيانات النشر: | Tennessee Research and Creative Exchange |
سنة النشر: | 2011 |
المجموعة: | University of Tennessee, Knoxville: Trace |
مصطلحات موضوعية: | missing data imputation, predictive modeling, partial least squares regression, LASSO, Adaptive LASSO, BART, Applied Statistics, Statistical Methodology, Statistical Models |
الوصف: | Problem: Real-time process and destructive test data were collected from a wood composite manufacturer in the U.S. to develop real-time predictive models of two key strength properties (Modulus of Rupture (MOR) and Internal Bound (IB)) of a wood composite manufacturing process. Sensor malfunction and data “send/retrieval” problems lead to null fields in the company’s data warehouse which resulted in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or using summary statistics such as mean or median in place of the null field. However, predictive model errors in validation may be higher in the presence of information loss. In addition, the selection of predictive modeling methods poses another challenge to many wood composite manufacturers. Approach: This thesis consists of two parts addressing above issues: 1) how to improve data quality using missing data imputation; 2) what predictive modeling method is better in terms of prediction precision (measured by root mean square error or RMSE). The first part summarizes an application of missing data imputation methods in predictive modeling. After variable selection, two missing data imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and compared with models of non-imputed data using ten-fold cross-validation. Root mean square error of prediction (RMSEP) and normalized RMSEP (NRMSEP) were calculated. The second presents a series of comparisons among four predictive modeling methods using imputed data without variable selection. Results: The first part concludes that expectation-maximization (EM) algorithm and multiple imputation (MI) using Markov Chain Monte Carlo (MCMC) simulation achieved more precise results. Predictive models based on imputed datasets generated more precise prediction results (average NRMSEP of 5.8% for model of MOR model and 7.2% for model of IB) than models ... |
نوع الوثيقة: | text |
وصف الملف: | application/pdf |
اللغة: | unknown |
العلاقة: | https://trace.tennessee.edu/utk_gradthes/1041Test; https://trace.tennessee.edu/context/utk_gradthes/article/2159/viewcontent/Yan_Zeng_Thesis_Final_Version.pdfTest |
الإتاحة: | https://trace.tennessee.edu/utk_gradthes/1041Test https://trace.tennessee.edu/context/utk_gradthes/article/2159/viewcontent/Yan_Zeng_Thesis_Final_Version.pdfTest |
رقم الانضمام: | edsbas.EF679820 |
قاعدة البيانات: | BASE |
الوصف غير متاح. |