رسالة جامعية

Robust Dependence-Adjusted Methods for High Dimensional Data

التفاصيل البيبلوغرافية
العنوان: Robust Dependence-Adjusted Methods for High Dimensional Data
المؤلفون: Bose, Koushiki
المساهمون: Fan, Jianqing, Operations Research and Financial Engineering Department
بيانات النشر: Princeton University
سنة النشر: 2018
المجموعة: DataSpace at Princeton University
مصطلحات موضوعية: Dependence Adjustment, Factor Models, High Dimensional Data, Robust Estimation, R package, Statistics
الوصف: The focus of this dissertation is the development, implementation and verification of robust methods for high dimensional heavy-tailed data, with an emphasis on underlying dependence-adjustment through factor models. First, we prove a nonasymptotic version of the Bahadur representation for a Huber loss M-estimator in the presence of heavy-tailed errors. Consequently, we prove a number of important normal approximation results, including the Berry-Esseen bound and Cramér-type moderate deviation. This theory is used to analyze a covariate-adjusted multiple testing procedure under moderately heavy-tailed errors. We prove that the procedure asymptotically controls the overall false discovery proportion at the nominal level. Next, we present the development of an R package that conducts factor-adjusted robust multiple testing of mean effects, even where the factors are unobservable or partially observable. Experiments on real and simulated datasets demonstrate the superior performance of our package. Applying this testing procedure to RNA-Seq data from autism patients, we find new evidence for the etiology of the disease and novel pathways that may be changed in autism. Many of the candidate genes found are responsible for functions affected by autism, or implicated in autism comorbidities like seizures and epilepsy. We observe differences between functions of genes implicated in male and female patients: promising results since autism is a heavily gender-biased disease. Next, we present an R package that performs large-scale model selection for high dimensional sparse regression in the presence of correlated covariates. The software implements a consistent model selection strategy when the covariate dependence can be reduced through factor models. Numerical studies show that it has nice finite-sample performance in terms of both model selection and out-of-sample prediction. Finally, we present a novel method for estimating higher moments of multivariate elliptical distributions. Existing estimators typically require ...
نوع الوثيقة: doctoral or postdoctoral thesis
اللغة: English
العلاقة: The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu; http://arks.princeton.edu/ark:/88435/dsp01ht24wn13dTest
الإتاحة: http://arks.princeton.edu/ark:/88435/dsp01ht24wn13dTest
رقم الانضمام: edsbas.3D4040BA
قاعدة البيانات: BASE