Continuum Regression illustrates differences between PCR, PLS and MLR
Sep 18, 2009
There has been a discussion this week on the International Chemometrics Society List (ICS-L) involving differences between Principal Components Regression (PCR), Partial Least Squares (PLS) regression and Multiple Linear Regression (MLR). This has brought up the subject of Continuum Regression (CR), which is one of my favorite topics!
I got into CR when I was working on my dissertation as it was a way to unify PCR, PLS and MLR so that their similarities and differences might be more thoroughly understood. CR is a continuously adjustable regression technique that encompasses PLS and includes PCR and MLR as opposite ends of the continuum. In CR the regression vector is a function the continuum parameter and of how many Latent Variables (LVs) are included in the model.
The continuum regression prediction error surface has been the logo for PLS_Toolbox since version 1.5 (1995). We use it in our courses because it graphically illustrates several important aspects of these regression methods. The CR prediction error surface is shown below.
The CR prediction error surface illustrates the following points:
1) Models with enough LVs converge to the MLR solution. There is a region of models near the MLR side with many LVs that all have the same prediction error, this is the flat green surface on the right. These models have enough LVs in them so they have all converged to the MLR solution.
2) Models with too few or irrelevant factors have large prediction error. The large errors on the left are from models with few factors nearer the PCR side. These models don’t have enough factors (or at least enough of the right factors) in them to be very predictive. Note also the local maximum along the PCR edge at 4 LVs. This illustrates how often some of the PCR factors are not relevant for prediction (no surprise, as they are determined only by the amount of predictor X variance captured) so they lead to larger errors when included.
3) PLS is much less likely to have irrelevant factors than PCR. The prediction error curve for PLS doesn’t show these local maxima because of the way the factors are determined using information from the predicted variable y, i.e. they are based on covariance.
4) PLS models have fewer factors than PCR models. The bottom of the trough, aka the “valley of best models” runs at an angle through the surface, showing that PLS models generally hit their minimum prediction error with fewer factors than PCR models. Of course, near the MLR extreme, they hit the minimum with just 1 factor!
5) PLS models are not more parsimonious than PCR models. The angle of the valley of best models through the surface illustrates that PLS models are not really more parsimonious just because they have few factors. They are, in fact, “trying harder” to be correlated with the predicted variable y and (as Hilko van der Voet has shown) are consuming more degrees of freedom in doing so. If PLS was more parsimonious than PCR just based on the number of factors, then the 1 factor CR model near MLR would be even better, and we know it’s not!
CR has been in PLS_Toolbox since the first versions. It is not a terribly practical technique, however, as it is difficult to choose the “best” place to be in the continuum. But it really illustrates the differences between the methods nicely. If you have an interest in CR, I suggest you see:
B.M. Wise and N.L. Ricker, “Identification of Finite Impulse Response Models with Continuum Regression,” Journal of Chemometrics, 7(1), pps. 1-14, 1993.
S. de Jong, B. M. Wise and N. L. Ricker, “Canonical Partial Least Squares and Continuum Power Regression,” Journal of Chemometrics, 15(2), pps 85-100, 2001.