Machine Learning for Chemometricians
ANNs, SVMs, XGBoost and other Non-linear Methods for Calibration and Classification
Course Description
While linear methods, such as PLS regression, work in a very wide range of problems of chemical interest, there are times when the relationships between variables are complex and require non-linear modeling methods. Many non-linear methods have been developed, however, we will focus on a few that we have found quite useful when dealing with data from chemical systems. The course begins with a discussion of linearizing transforms. Augmenting with non-linear transforms, e.g. polynomials, is discussed next. It is then shown how Locally Weighted Regression (LWR) and Hierarchical Models (HM) can handle non-linearity by using linear sub-models. More difficult non-linear relationships can be handled using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and Gradient-boosed Ensemble methods (XGBoost) for both regression and classification analysis. These methods are explained in detail and the meta-parameters associated with them discussed. The course includes hands-on computer time for participants to work example problems using PLS_Toolbox or Solo.
Prerequisites
Linear Algebra for Chemometricians, MATLAB for Chemometricians, Chemometrics I — PCA, Chemometrics II–Regression and PLS or equivalent experience.
Course Outline
- Introduction
– Why non-linear methods?
– How linear methods deal with non-linear data - Variable Transformations
– Log, sqrt, etc.
– Augmenting with non-linear transforms - Factor based transforms
– PCA Scores and Augmenting
– Polynomial PLS - Locally Weighted Regression
– Weighted Regression
– Distance Measures
– Basing Models on PCA Scores - Hierarchical Models
– Dividing regressions into domains - Support Vector Machines
– Classification and Regression Models - Artificial Neural Networks
– Classification and Regression Models - Gradient Boosted Decision Trees
– Classification and Regression Ensemble Models - Choosing the right method
– Prediction skill
– Computational performance
– Deployment options