B02: Improving Predictive Power with Pre-Tuned Principal Component Regression: A Case Study

Linear regression is a commonly used approach to discovering associations between variables in various fields. Least squares method is a prominent approach to estimating unknown regression coefficients, with ordinary least square (OLS) estimates being the best linear unbiased estimate (BLUE). The existence of the inverse of the gram matrix in OLS is essential for obtaining accurate estimations. However, multicollinearity is a common issue in real-world scenarios, particularly with high-dimensional datasets. The numerical construction of the inverse of the gram matrix becomes problematic when independent variables are highly correlated. Additionally, the gram matrix becomes singular or non-invertible when the design matrix does not have a full rank. This can result in an increase in the variance of coefficient estimates, and the regression coefficients may become unstable or non-unique in some cases. Moreover, redundant predictors included in the model can lead to overfitting, which not only causes a lack of fit but also makes it difficult to interpret the model and determine the relationship between independent and dependent variables. Principal component regression (PCR) addresses this issue by regressing on principal components instead of the original independent variables. However, traditional PCR is computationally inefficient and has low prediction power. In this project, we proposed a flexible approach to improve prediction performance while shortening computing time. The main idea involves approximating the indicator threshold function with a smooth flexible sigmoid surrogate function, yet in several different ways. This approximation changes the discrete model selection procedure to the continuous optimization problem. Based on our extensive simulation studies, our proposed method yields a better predictive performance compared to the other variants of pre-tuned PCR, especially for data with large sample sizes.

Author(s): My Nguyen, Data Science & Statistics Major

Advisor(s): Pei Wang, Department Statistics

B02: Improving Predictive Power with Pre-Tuned Principal Component Regression: A Case Study

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top