Variable Selection

Course Description

Variable Selection deals with one of the most difficult problems in chemometrics, selecting variables for regression and classification. In many situations in model building, variable selection, is useful for improving predictions, or for minimizing the number of variables and for other purposes such as reducing costs. But how to do it? Genetic algorithms, forward selection, and jack-knifing are just few of the possible ways to do variable selection. In this short course, the theory behind when to use what is given and an outline of different possible approaches is given. Through examples and exercises, it is shown how some approaches work well in some situations and not in others. The course includes hands-on computer time for participants to work example problems using PLS_Toolbox and MATLAB.

Prerequisites

MATLAB for Chemometricians and Chemometrics II--Regression and PLS or equivalent experience.

Course Outline

  1. Motivation and Preliminary Examples: Why select variables?
  2. Available Variable Selection Methods:
    A priori
    A posteriori
    Model based, e.g. on loadings
    Genetic algorithms
    Classical
      Forward, backward selection
      Best subset selection
      Significance tests
      Significance based on Jack-knife
    i-PLS
  3. How to choose a variable selection method

  4. Variable selection in practice