Outliers are a common problem in industrial data sets. In fact, the presence of outliers is more the norm than the exception. These unusual, often “erroneous” observations heavily affect the classical estimates of data mean, variance and covariance. Without proper treatment, the resulting data models are not an accurate representation of the bulk of the data. Alternately, outlier samples are sometimes the most interesting samples in a data set, revealing unique properties or trends. If these samples are not identified, opportunities for discovery can be missed. Robust Methods deal with the problem of outliers by determining which samples represent the “consensus” in the data and basing the models on those samples, while ignoring the outliers. The course starts with methods for robust estimation of the mean and variance/covariance and go on to methods for robust Principal Components and Partial Least Squares regression.
- The outlier problem
- Robust estimation of the mean: Median
- Robust estimation of the covariance matrix: Minimum Covariance Determinant
- Robust linear regresssion
- Robust Principal Components Analysis: ROBPCA
- Robust Regression in High Dimensions: ROBPCR and ROBPLS