دورية أكاديمية

A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites

التفاصيل البيبلوغرافية
العنوان: A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites
المؤلفون: Zeng, Yan
المصدر: Masters Theses
بيانات النشر: Tennessee Research and Creative Exchange
سنة النشر: 2011
المجموعة: University of Tennessee, Knoxville: Trace
مصطلحات موضوعية: missing data imputation, predictive modeling, partial least squares regression, LASSO, Adaptive LASSO, BART, Applied Statistics, Statistical Methodology, Statistical Models
الوصف: Problem: Real-time process and destructive test data were collected from a wood composite manufacturer in the U.S. to develop real-time predictive models of two key strength properties (Modulus of Rupture (MOR) and Internal Bound (IB)) of a wood composite manufacturing process. Sensor malfunction and data “send/retrieval” problems lead to null fields in the company’s data warehouse which resulted in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or using summary statistics such as mean or median in place of the null field. However, predictive model errors in validation may be higher in the presence of information loss. In addition, the selection of predictive modeling methods poses another challenge to many wood composite manufacturers. Approach: This thesis consists of two parts addressing above issues: 1) how to improve data quality using missing data imputation; 2) what predictive modeling method is better in terms of prediction precision (measured by root mean square error or RMSE). The first part summarizes an application of missing data imputation methods in predictive modeling. After variable selection, two missing data imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and compared with models of non-imputed data using ten-fold cross-validation. Root mean square error of prediction (RMSEP) and normalized RMSEP (NRMSEP) were calculated. The second presents a series of comparisons among four predictive modeling methods using imputed data without variable selection. Results: The first part concludes that expectation-maximization (EM) algorithm and multiple imputation (MI) using Markov Chain Monte Carlo (MCMC) simulation achieved more precise results. Predictive models based on imputed datasets generated more precise prediction results (average NRMSEP of 5.8% for model of MOR model and 7.2% for model of IB) than models ...
نوع الوثيقة: text
وصف الملف: application/pdf
اللغة: unknown
العلاقة: https://trace.tennessee.edu/utk_gradthes/1041Test; https://trace.tennessee.edu/context/utk_gradthes/article/2159/viewcontent/Yan_Zeng_Thesis_Final_Version.pdfTest
الإتاحة: https://trace.tennessee.edu/utk_gradthes/1041Test
https://trace.tennessee.edu/context/utk_gradthes/article/2159/viewcontent/Yan_Zeng_Thesis_Final_Version.pdfTest
رقم الانضمام: edsbas.EF679820
قاعدة البيانات: BASE