Common Mistakes in Machine Learning (and how not to make them)
Course Description
Machine learning tools for chemometrics make it possible to handle complex data and extract useful information. Unfortunately, though, going from univariate to multivariate analysis does not imply that there are less pitfalls and potential problems in the data analysis. In this course, we will go through many of the problems that occur when analyzing and interpreting multivariate data. The examples will mainly focus on the use of PCA and PLS but most of the conclusions are generally applicable.
The course includes hands-on computer time for participants to work example problems using PLS_Toolbox or Solo.
Prerequisites
Chemometrics I–PCA and Chemometrics II–Regression and PLS or equivalent experience.
Course Outline
- Chemometrics – the basic idea
- Variability is not information
- How to measure variability – misuse of correlations
- Interpreting scores and loadings
- Interpreting biplots
- How to optimize a model
- Interpreting a regression model
- Interpreting a regression vector
- Understanding correlation and causality
- How to determine the number of components