Eigenvector University Europe is in Rome, ITALY October 14-17, 2024 Complete Info Here!

Diviner

Semi-automated Machine Learning for Accelerating Model Development

PLS_Toolbox and Solo offer the widest variety of methods for modeling analytical data, including preprocessing, variable selection, outlier detection, and regression methods. The problem has been how to search over these options efficiently and produce a quality model in limited time. Enter Diviner, Eigenvector’s new tool for accelerating model development. Diviner goes step by step through the modeling process, evaluating hundreds of modeling options, with key stopping points that allow the analyst to review results and learn from them before proceeding to the next step. Modeling choices are evaluated and results presented graphically in a totally transparent process. In the end the analyst is allowed to select a single model or an ensemble of models for deployment or for further refinement.

In the sections that follow the Diviner workflow is reviewed.

Initial Setup

Selecting Diviner in the Browser brings up the main Diviner interface, shown below. Calibration data is loaded into the interface. Optionally, Validation data may also be supplied. Detailed settings (outlier detection sensitivity, addition of MLR models, etc.) can be found under the Edit menu.

Once data is loaded the preprocessing options are set. Clicking on the Preprocessing button brings up the Preprocessing Settings window. There are preset libraries for different types of spectroscopy and other data. These preprocessing recipes can be customized using any of the many methods supported in PLS_Toolbox and Solo. Recipes can also be specified for the preliminary outlier detection models. Cross-validation parameters are also set from the main interface.

Outlier Review

Outlier detection can be enabled on the main interface. If turned off, Diviner proceeds with the data as is. When on, the outlier detection module constructs Robust PCA and Robust PLS models with the outlier detection preprocessing choices set previously. Several diagnostic plots are then available, including the Potential Outlier Status plot shown below. Potential outliers are shown for each of the preprocessing methods surveyed. Individual Robust PCA and PLS models used in the construction of the plot can be accessed and interrogated. A final selection of samples to remove from the modeling is made, shown in green, and then modeling proceeds. This selection can be based on a single model, a consensus of all the models, or selected by hand.

Initial Model Generation

Once outliers are set Diviner develops models using all of the preprocessing recipes set. It also performs automatic variable selection (using a method based on Variable Influence on Prediction and Selectivity Ratio) on all of the models, and evaluates them over the number of latent variables specified. All of the models can then be compared as shown in the plot below. Each point on the plot represents a model (680 of them in this plot). Here the overfit ratio (Root-Mean-Square Error of Cross-Validation divided by Root-Mean-Square Error of Calibration, RMSECV/RMSEC) is plotted against cross-validation error (RMSECV) and models are keyed to number of LVs. If a validation set is supplied it is also possible to plot against Root-Mean-Square Error of Prediction (RMSEP).

Numerous class sets are created to group the preprocessing methods used for visualization purposes. In the figure below, for instance, the models are plotted and classed by the type of normalization used. It is also easy to show models using derivatives, models with variable selection and of course the exact preprocessing using in any particular model. From these plots trends can be observed as to what is generally working to improve models (or not).

Models can be selected for refinement from any of the plots. We suggest using the overfit (RMSECV/RMSEC) versus cross-validation (RMSECV) or prediction error on the validation set (RMSEP) and selecting models with low (good) cross validation error and low overfit values, i.e. models in the lower left corner.

Refinement of Candidate Models and Final Model Selection

Selected models can be further refined with variable selection using iPLS, fine tuning of the preprocessing. They can also be tested to see if samples previously flagged as outlier can be re-included in the model. The performance of these refined models is presented graphically as above. A final model may be selected or a number of models can be selected to use as an ensemble. When used for prediction using Solo_Predictor, the average or median value of the prediction from the ensemble can be used. Final models can also inspected and refined by the analyst if desired.

Total Transparency

Diviner keeps the analyst apprised of all the results and decisions as it proceeds through the model making process. The result is a totally transparent process where the analyst can use their knowledge and preferences to guide modeling without having to make and test all the models individually. And by aggregating results from a wide variety of models, the analyst is likely to learn something about the data from what works and what doesn’t. In the end, Diviner greatly speeds model development but doesn’t shut out the analyst!

System Requirements

Diviner is included in PLS_Toolbox and Solo 9.5. In general, Eigenvector products should work on most modern computers. See our installation instructions (https://wiki.eigenvector.com/index.php?title=Installation) for detailed information.

PLS_Toolbox does not require any other MATLAB toolboxes but will make use of the Parallel Computing Toolbox in certain scenarios if present.

Product Support

Eigenvector Research offers user support for PLS_Toolbox and Solo by e-mail at helpdesk@eigenvector.com. Questions are almost always answered within 24 hours (and usually much less). Updates and bug fixes will be available for users to download from our web site. For information on other support options, see our technical support page.

Get More Information

Order PLS_Toolbox or Solo

For information on multi-client servers, site-licenses, and OEM options, contact us by phone (509.662.9213) or e-mail (sales@eigenvector.com)Our product pricelist information page includes pricing and other order information for all of our products.