التفاصيل البيبلوغرافية
العنوان: |
Robust Dependence-Adjusted Methods for High Dimensional Data |
المؤلفون: |
Bose, Koushiki |
المساهمون: |
Fan, Jianqing, Operations Research and Financial Engineering Department |
بيانات النشر: |
Princeton University |
سنة النشر: |
2018 |
المجموعة: |
DataSpace at Princeton University |
مصطلحات موضوعية: |
Dependence Adjustment, Factor Models, High Dimensional Data, Robust Estimation, R package, Statistics |
الوصف: |
The focus of this dissertation is the development, implementation and verification of robust methods for high dimensional heavy-tailed data, with an emphasis on underlying dependence-adjustment through factor models. First, we prove a nonasymptotic version of the Bahadur representation for a Huber loss M-estimator in the presence of heavy-tailed errors. Consequently, we prove a number of important normal approximation results, including the Berry-Esseen bound and Cramér-type moderate deviation. This theory is used to analyze a covariate-adjusted multiple testing procedure under moderately heavy-tailed errors. We prove that the procedure asymptotically controls the overall false discovery proportion at the nominal level. Next, we present the development of an R package that conducts factor-adjusted robust multiple testing of mean effects, even where the factors are unobservable or partially observable. Experiments on real and simulated datasets demonstrate the superior performance of our package. Applying this testing procedure to RNA-Seq data from autism patients, we find new evidence for the etiology of the disease and novel pathways that may be changed in autism. Many of the candidate genes found are responsible for functions affected by autism, or implicated in autism comorbidities like seizures and epilepsy. We observe differences between functions of genes implicated in male and female patients: promising results since autism is a heavily gender-biased disease. Next, we present an R package that performs large-scale model selection for high dimensional sparse regression in the presence of correlated covariates. The software implements a consistent model selection strategy when the covariate dependence can be reduced through factor models. Numerical studies show that it has nice finite-sample performance in terms of both model selection and out-of-sample prediction. Finally, we present a novel method for estimating higher moments of multivariate elliptical distributions. Existing estimators typically require ... |
نوع الوثيقة: |
doctoral or postdoctoral thesis |
اللغة: |
English |
العلاقة: |
The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the library's main catalog: catalog.princeton.edu; http://arks.princeton.edu/ark:/88435/dsp01ht24wn13dTest |
الإتاحة: |
http://arks.princeton.edu/ark:/88435/dsp01ht24wn13dTest |
رقم الانضمام: |
edsbas.3D4040BA |
قاعدة البيانات: |
BASE |