Join us for the 18th Eigenvector University in Seattle May 6-10, 2024 Complete Info Here!

Switch from MLR to PLS?

Apr 20, 2011

In a recent post, Fernando Morgado posed the following question to the NIR Discussion List: “When is it necessary to move from traditional Multiple Linear Regression (MLR) to Partial Least Squares Regression (PLS)?” That’s a good discussion starter! I’ll offer my take on it here. I have several answers to the question, but before I get to that, it is useful to outline the differences between MLR and PLS.

MLR, PLS, Principal Components Regression (PCR) and a number of other methods are all Inverse Least Squares (ILS) models. Given a set of predictors, X (m samples by n variables), and a variable to be predicted, y (m by 1), they find b, such that the estimate of y, ý = Xb, from b = X+y. The difference between the methods is that they all use different ways to estimate X+, the pseudoinverse of X.

In MLR, X+ = (XTX)-1XT. In PCR, where the data matrix is decomposed via Principal Components Analysis (PCA) as X = TkPkT + E, where Tk is the (m by k) matrix containing the scores on the first k PCs, Pk is the (n by k) matrix containing the first k PC loadings and E is a matrix of residuals, then X+ = Pk(TkTTk)-1TkT. The number of PCs, k, is determined via cross-validation or any number of other methods. In PLS, the decomposition of X is somewhat more complicated, and the resulting inverse is X+ = Wk(PkTWk)-1(TkTTk)-1TkT, where the additional parameter Wk (n by k) is known as the weights.

With that background covered, we can now consider “When is it necessary to move from traditional Multiple Linear Regression (MLR) to Partial Least Squares Regression (PLS)?

1) Any time the rank of the spectral data matrix X is less than the number of variables. The mathematical rank of a matrix is well defined as the number of linearly independent rows or columns. It is important because the MLR solution includes the term (XTX)-1 in the pseudoinverse. If X has rank less than the number of variables, XTX has rank less than its dimension, i.e. is rank deficient, and its inverse is undefined. PCR and PLS avoid this problem by decomposing X and forming a solution based on the large variance (stable) part of the decomposition. From this, it is clear that another answer to the question must be:

2) Any time the data contains fewer samples than variables. This is a common problem in spectroscopy because many instruments measure hundreds or thousands of variables (channels), but acquiring that many samples can be a very expensive proposition. The obvious follow on question is, “Then why not just reduce the number of variables.” The answer to that is, in short, noise reduction. Averaging of correlated measurements results in a reduction in noise.

But what about the case with m > n, but X is nearly rank deficient? That is, X is of full rank only because it is corrupted by noise? This leads to:

3) Any time the chemical rank of the spectral data matrix X is less than the number of variables. By chemical rank we mean the number of variations in the data that are due to chemical variation as opposed to detector noise and other minor effects.

So if any of the above three conditions exist, then it is appropriate to move from MLR to a factor-based method such as PLS or PCR. But I’m going to play devil’s advocate here a bit and give one more answer:

4) Always. MLR is, after all, just a special case of PLS and PCR. If you include all the components, you arrive at the same model that MLR gives you. But along the way you get additional diagnostic information from which you might learn something. The scores and loadings (and weights) all give you information about the chemistry.

On the NIR Discussion list, Donald J. Dahm wrote: “As a grumpy old man, I say the time to switch to PLS is when you are ready to admit that you don’t have the knowledge or patience to do actual spectroscopy.” I hope that was said with tongue firmly planted in cheek, because I’d argue that the opposite is true. When you are using PLS or PCR and interrogating the models, you are learning about the chemistry. When you use MLR, you are simply fitting the data.

BMW