Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/43427
Title: Variable Selection Procedures In Linear Regression Models
Authors: XIE YANXI
Keywords: Variable Selection; linear regression models; h-likelihood; Forward Regression; Orthogonal Matching Pursuit
Issue Date: 3-May-2013
Citation: XIE YANXI (2013-05-03). Variable Selection Procedures In Linear Regression Models. ScholarBank@NUS Repository.
Abstract: With the rapid development in information technology industry, contemporary data from various fields such as finance and gene expressions tend to be extremely large, where the number of variables or parameters d can be much larger than the sample size n. For example, one may wish to associate protein concentrations with expression of genes, or to predict survival time by using gene expression data. To solve this kind of high dimensionality problems, it is challenging to find important variables out of thousands of predictors, with a number of observations usually in tens or hundreds. In other words, it is becoming a major issue to investigate the existence of complex relationships and dependencies in data, in the aim of building a relevant model for inference. In fact, there are two fundamental goals in statistical learning: identifying relevant predictors and ensuring high prediction accuracy. The first goal, by means of variable selection, is of particular importance when the true underlying model has a sparse representation. Discovering relevant predictors can enhance the performance of the prediction for the fitted model. Usually an estimate $\hat{\beta}$ is considered desirable if it is consistent in terms of both coefficient estimate and variable selection. Hence, before we try to estimate the regression coefficients $\beta$, it is preferable that we have a set of useful predictors in hand. The emphasis of our task in this thesis is to propose methods, in the aim of identifying relevant predictors to ensure selection consistency, or screening consistency in variable selection. The primary interest is on Orthogonal Matching Pursuit (OMP) and Forward Regression (FR). Theoretical aspects of OMP and FR are investigated in details in this thesis. Furthermore, we have introduced a new penalized h-likelihood approach to identify non-zero relevant fixed effects in the partial linear model setting. This penalized h-likelihood incorporates variable selection procedures in the setting of mean modeling via h-likelihood. A few advantages of this newly proposed method are listed below. First of all, compared to the traditional marginal likelihood, the h-likelihood avoids the messy integration for the random effects and hence is convenient to use. In addition, h-likelihood plays an important role in inferences for models having unobservable or unobserved random variables. Last but not least, it has been demonstrated by simulation studies that the proposed penalty-based method is able to identify zero regression coefficients in modeling the mean structure and produces good fixed effects estimation results.
URI: http://scholarbank.nus.edu.sg/handle/10635/43427
Appears in Collections:Ph.D Theses (Open)

Show full item record
Files in This Item:
File Description SizeFormatAccess SettingsVersion 
thesis.pdf1.05 MBAdobe PDF

OPEN

NoneView/Download

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.