Machine Learning for Chemometricians

ANNs, SVMs, XGBoost and other Non-linear Methods for Calibration and Classification

Course Description

While linear methods, such as PLS regression, work in a very wide range of problems of chemical interest, there are times when the relationships between variables are complex and require non-linear modeling methods. Many non-linear methods have been developed, however, we will focus on a few that we have found quite useful when dealing with data from chemical systems. The course begins with a discussion of linearizing transforms. Augmenting with non-linear transforms, e.g. polynomials, is discussed next. It is then shown how Locally Weighted Regression (LWR) and Hierarchical Models (HM) can handle non-linearity by using linear sub-models. More difficult non-linear relationships can be handled using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and Gradient-boosed Ensemble methods (XGBoost) for both regression and classification analysis. These methods are explained in detail and the meta-parameters associated with them discussed. The course includes hands-on computer time for participants to work example problems using PLS_Toolbox or Solo.

Prerequisites

Linear Algebra for Chemometricians, MATLAB for Chemometricians, Chemometrics I — PCA, Chemometrics II–Regression and PLS or equivalent experience.

Course Outline

  1. Introduction
    – Why non-linear methods?
    – How linear methods deal with non-linear data
  2. Variable Transformations
    – Log, sqrt, etc.
    – Augmenting with non-linear transforms
  3. Factor based transforms
    – PCA Scores and Augmenting
    – Polynomial PLS
  4. Locally Weighted Regression
    – Weighted Regression
    – Distance Measures
    – Basing Models on PCA Scores
  5. Hierarchical Models
       – Dividing regressions into domains
  6. Support Vector Machines
       – Classification and Regression Models
  7. Artificial Neural Networks
       – Classification and Regression Models
  8. Gradient Boosted Decision Trees
       – Classification and Regression Ensemble Models
  9. Choosing the right method
       – Prediction skill
       – Computational performance
       – Deployment options