Advanced Preprocessing for Spectroscopic Applications

Course Description

The objective of data preprocessing is to remove extraneous variance and anomalies and is often the critical step in development of a successful multivariate calibration or classification scheme.

Preprocessing is often the critical step in the development of multivariate regression and classification models. Spectroscopic data poses its own unique problems and also opportunities due to its highly structured nature. The objective of spectroscopic data preprocessing is to maximize signal-to-clutter (S/C) where clutter is defined as extraneous variance and data anomalies that can 'distract' model development. Maximizing S/C is a different paradigm than maximizing signal-to-noise and a firm understanding of the preprocessing algorithms and objectives can lead to more efficient and effective model development.

Advanced Preprocessing for Spectroscopic Applications starts with a brief review of basic preprocessing methods to demonstrate how they work within the objective of maximizing S/C and how they can be misused. The course then delves into more advanced topics such as multiplicative scatter correction, extended multiplicative scatter correction and generalized least squares-like weighting. Examples will be focused on spectroscopic applications although many methods are directly extensible to other types of data. The mathematical principles for the preprocessing methods will also be covered. The course includes hands-on computer time for participants to work example problems using PLS_Toolbox, EMSC_Toolbox, and MATLAB.

Prerequisites

Linear Algebra for Chemometricians, MATLAB for Chemometricians and Chemometrics II--Regression and PLS or equivalent experience.

Course Outline

  1. Preprocessing Objectives
  2. Matrix Rank and the Bilinear Model
  3. Mean- and Median-centering, Autoscaling
  4. Normalization and Standard Normal Variate Scaling
  5. Scaling for Multi-block data
  6. Savitsky-Golay and Filtering
  7. Multiplicative Scatter Correction (MSC)
  8. External Parameter Orthogonalization (EPO)
  9. Extended Multiplicative Scatter Correction (EMSC)
  10. Generalized Least Squares and GLS-like Weighting
  11. Orthogonal Signal Correction (OSC) and Orthogonal-PLS