Category Archives: Software
Feb 16, 2019
Chimiométrie 2019 was held in Montpellier, January 30 to February 1. Now in its 20th year the conference attracted over 150 participants. The conference is mostly in French, (which I have been trying to learn for many years now), but also with talks in English. The Scientific and Organizing Committee Presidents were Ludovic Duponchel and J.M. Roger, respectively.
Eigenvector was proud to sponsor this event, and it was fun to have a display table and a chance to talk with some of our software users in France. As usual, I was on the lookout for talks and posters using PLS_Toolbox. I especially enjoyed the talk presented by Alice Croguennoc, Some aspects of SVM Regression: an example for spectroscopic quantitative predictions. The talk provided a nice intro to Support Vectors and good examples of what the various parameters in the method do. Alice used our implementation of SVMs, which adds our preprocessing, cross-validation and point-and-click graphics to the publicly available LIBSVM package. Ms. Croguennoc demonstrated some very nice calibrations on a non-linear spectroscopic problem.
I also found three very nice posters which utilized PLS_Toolbox:
Chemometric methods applied to FT-ICR/MS data: comprehensive study of aromatic sulfur compounds in gas oils by J. Guillemant, M. Lacoue-Nègre, F. Albrieux, L. Duponchel, L.P de Oliveira and J.F Joly.
Chemometric tools associated to FTIR and GC-MS for the discrimination and the classification of diesel fuels by suppliers by I. Barra, M. Kharbach, Y. Cherrah and A. Bouklouze.
Preliminary appreciation biodegradation of formate and fluorinated ethers by means of Raman spectroscopy coupled with chemometrics by M. Marchetti, M. Offroy, P. Bourson, C. Jobard, P. Branchu, J.F. Durmont, G. Casteran and B. Saintot.
By all accounts the conference was a great success, with many good talks and posters covering a wide range of chemometric topics, a great history of the field by Professor Steven D. Brown, and a delicious and fun Gala dinner at the fabulous Chez Parguel, shown at left. The evening included dancing, and also a song, La Place De la Conférence Chimiométrie, (sung to the tune of Patrick Bruel’s Place des Grands Hommes), written by Sylvie Roussel in celebration of the conference’s 20th year and sung with great gusto by the conferees. Also, the lecture hall on the SupAgro campus was very comfortable!
Congratulations to the conference committees for a great edition of this French tradition, with special thanks to Cécile Fontange and Sylvie Roussel of Ondalys for their organizational efforts. À l’année prochaine!
Jan 10, 2019
I logged in to LinkedIn this morning and found a discussion about Python that had a lot of references to PLS_Toolbox in it. The thread was started by one of our long time users, Erik Skibsted who wrote:
“MATLAB and PLS_Toolbox has always been my preferred tools for data science, but now I have started to play a little with Python (and finalised my first on-line course on Data Camp). At Novo Nordisk we have also seen a lot of small data science initiatives last year where people are using Python and I expect that a lot more of my colleagues will start coding small and big data science projects in 2019. It is pretty impressive what you can do now with this open source software and different libraries. And I believe Python will be very important in the journey towards a general use of machine learning and AI in our company.”
This post prompted well over 20 responses. As creator of PLS_Toolbox I thought I should jump in on the discussion!
In his response, Matej Horvat noted that Python and other open source initiatives were great “if you have the required coding skills.” This a key phrase. PLS_Toolbox doesn’t require any coding skills _at all_. You can use it entirely in point-and-click mode and still get to 90% of what it has to offer. (This makes it the equivalent of using our stand-alone product Solo.) When you are working with PLS_Toolbox interfaces it looks like the first figure below.
Of course if you are a coder you can take advantage of the ability to also use it in command line mode and build it into your own scripts and functions, just like you would do with other MATLAB toolboxes. The caveat is that you can’t redistribute it without an additional license from us. (We do sell these of course, contact me if you are interested.) When you are working with Python, (or developing MATLAB scripts incorporating PLS_Toolbox functions for that matter), it looks like the second figure.
Like Python, PLS_Toolbox is “open source” in the sense that you can actually see the code. We’re not hiding anything proprietary in it. You can find out exactly how it works. You can also modify if you wish, just don’t ask for help once you do that!
Unlike typical open source projects, with PLS_Toolbox you also get user support. If something doesn’t work we’re there to fix it. Our helpdesk has a great reputation for prompt responses that are actually helpful. That’s because the help comes from the people that actually developed the software.
Another reason to use PLS_Toolbox is that we have implemented a very wide array of methods and put them into the same framework so that they can be evaluated in a consistent way. For instance, we have PLS-DA, SVM-C, and now XGBoost all in the same interface that use the exact same preprocessing and are all cross-validated and validated in the same exact way so that they can be compared directly.
If you want to be able to freely distribute the models you generate with PLS_Toolbox we have have a tool for that: Model_Exporter. Model_Exporter allows users to export the majority of our models as code that you can compile into other languages, including direct export of Python code. You can then run the models anywhere you like, such as for making online predictions in a control system or with handheld spectrometers such as ThermoFisher’s Truscan. Another route to online predictions is using our stand-alone Solo_Predictor which can run any PLS_Toolbox/Solo model and communicates using a number of popular protocols.
PLS_Toolbox is just one piece of the complete chemometrics solutions we provide. We offer training at our renowned Eigenvector University and many other venues such as the upcoming course in Tokyo, EigenU Online, and an extensive array of help videos. And if that isn’t enough we also offer consulting services to help you develop and implement new instruments and applications.
So before you spend a lot of valuable time developing applications in Python, make sure you’re not just recreating tools that already exist at Eigenvector!
Nov 22, 2017
Integration of Eigenvector’s multivariate analysis software with Metrohm’s Vis-NIR analyzers will give users access to advanced calibration and classification methods.
Metrohm’s spectroscopy software Vision Air 2.0 supports prediction models created in EVRI’s PLS_Toolbox and Solo software and offers convenient export and import functionality to enable measurement execution and sample analysis in Metrohm’s Vision Air software. Customers will benefit from data transfer between PLS_Toolbox/Solo and Vision Air and will enjoy a seamless experience when managing models and using Metrohm’s NIR laboratory instruments. Metrohm has integrated Eigenvector’s prediction engine, Solo_Predictor, so that users can apply any model created in PLS_Toolbox/Solo.
Data scientists, researchers and process engineers in a wide variety of industries that already use or would like to use Eigenvector software will find this solution appealing. PLS_Toolbox and Solo’s intuitive interface and advanced visualization tools make calibration, classification and validation model building a straightforward process. A wide array of model types, preprocessing methods and the ability to create more complex model forms, such as hierarchical models with conditional branches, make Eigenvector software the preferred solution for many.
“This a win-win for users of Metrohm NIR instruments and users of Eigenvector chemometrics software” says Eigenvector President Dr. Barry M. Wise. “Thousands of users of EVRI software will be able to make models for use on Metrohm NIR instruments in their preferred environment. And users of Metrohm NIR instruments will have access to more advanced data modeling techniques.”
Researchers benefit from Metrohm’s Vis-NIR Instrument and Vision Air software through instruments covering the visible and NIR wavelength range, intuitive operation, state-of-the art user management with strict SOPs and global networking capabilities. Combining the solutions will create an integrated experience that will save time, improve product development process and provide better control of product quality.
Key Advantages PLS_Toolbox/Solo:
- Integration of Solo_Predictor allows users to run any model developed in PLS_Toolbox/Solo
- Allows users to make calibration and classification models in PLS_Toolbox and Solo’s user-friendly modeling environment
- Supports standard model types (PCA, PLS, PLS-DA, etc.) with wide array of data preprocessing methods
- Advanced models (SVMs, ANNs, etc.) and hierarchical models also supported
Key Advantages Vision Air:
- Intuitive workflow due to appealing and smart software concept with specific working interfaces for routine users, and lab managers
- Database approach for secure data handling and easy data management
- Powerful network option with global networking possibility and one-click instruments maintenance
- Full CFR Part 11 compliance
Aug 3, 2017
Hello EigenFriends and EigenFans,
The ICNIRS conference was held June 11-15 in Copenhagen, Denmark, where close to 500 colleagues gathered for the largest forum on Near-Infared Spectroscopy in the world. The conference featured several keynote lectures, classes taught by EVRI associate Professor Rasmus Bro, and also held several poster sessions where over 20 conference attendees displayed their research using EVRI software! We’d like to feature some of the posters and authors below: thanks for using our software, everyone!
- Y. Allouche, J.A. Fernandez Perna, V. Baeten, & A. Jimenez. “On-line Near Infrared Spectroscopy and Chemometrics for Characterization of Olive Oils at the Exit of a Decanter Centrifuge”
- C. Y. Bastidas, C. von Plessing, J. Troncoso, & R. del Pilar Castillo. “Quantification of an Antibiotic in Salmon Feed Pellets with NIR Spectroscopy and Multivariate Calibration”
- B. Carrasco, D. Vincke, V. Baeten, & J.A. Fernandez Perna. “Application of Near Infrared Spectroscopy and Chemometrics for the Characterization of Complex Mixtures of Food Additives”
- M. Chaudhry, G. Colelli, & M. Amodio.“Potential of Hyperspectral Imaging to Predict Quality and Shelf-Life of Fresh Rocket Leaves to be Used for Fresh-Cut Processing”
- S.Duthen, D. Kleiber, J. Dayde, C. Raynaud, & C. Levasseur-Garcia. “Determination of the Moisture Content of Gelatin Sample”
- D. Eylenbosch, J. A. Fernandez Pierna, V. Baeten, & B. Bodson. “Comparison of PLS and SVM Discriminant Analysis for NIR Hyperspectral Data of Wheat Roots in Soil”
- L. Franca, S. Grassi, M. F. Pimentel, & J. M. Amigo. “Handcraft Beer Monitoring Using NIR Handheld Equipment”
- R. Gasbarrone, S. Serranti & G. Bonifazi. “An Investigation on Non-ferrous Metals Particles Separability from Electronic Scraps using Hyperspectral Imaging and Micro-XRF Analysis”
- S. Montagneir, J. Lallemand, P. Herbert, J. Guilment, & S. Roussel. “Discriminant Strategies for Polymer Identification during Continuous On-line Processes by Near Infrared Spectroscopy”
- R. Palmieri, S. Serranti, G. Bonifazi, & F. Maffei. “Monitoring of Microplastics from Marine Environment Adopting HyperSpectral Imaging”
- J. F. Q. Pereira, C. S. Silva, M. J. Vieira, M. F. Pimentel, A. Braz, & R. S. Honorato. “Evaluation and Identification of Blood Stains in Crime Scenes with Ultra-portable NIR Spectrometer”
- Y. Pu, D. Sun, C. Riccioli, M. Buccheri, M. Grassi, T. M. P. Cattaneo, & A. Gowen. “Calibration Transfer from MicroNIR Spectrometer to Hyperspectral Imaging: A Case Study on Predicting Soluble Solids Content of Bananito Fruit (Musa acuminata)”
- M. M. Reis, I. Kaur, G. Weralupitiya, C. Wang, & M. G. Reis. “Near InfraRed Spectroscopy Applied to Non-invasive Assessment of Physical-chemical Attributes of Dairy Powders”
- R. Rios-Reina, D. L. Garcia-Gonzalez, R. M. Collejon, & J. M. Amigo. “Application of Near-Infrared (NIR) Spectroscopy and Chemometrics to Classify and Authentify Wine Vinegars from Different Protected Designation of Origin”
- J. Sun, A. McGlone, R. Kunnememeyer, N. Tomer, & M. Punter. “Which Optical Geometry is Best to Detect Vascular Browning in Apples?”
Oct 20, 2016
Scott Koch joined Eigenvector in January, 2004, and quickly made himself indispensable. Although his title is Senior Software Engineer, Scott commented the other day, “we wear so many hats in a small company that I don’t know if a title is really useful.” He tackles a variety of jobs at EVRI including interface and database design, software version control and general troubleshooting. Scott is fluent in MATLAB, SQL, Java, Subversion and Python and contributes to our products, e.g. PLS_Toolbox and Solo, as well as working with our clients on custom applications. You can also catch him as a guest blogger for Undocumented Matlab, where he discusses his work involving Matlab-Java programming.
In addition to working at Eigenvector, Scott is also an outdoor enthusiast and distance runner, and can often be seen flying through the trails of the west coast. Here he is running the Broken Arrow Trial near Sedona, AZ. He’s also an avid backcountry skier and rally car driver.
Says Barry, President of EVRI: “When we hired Scott there was one thing on his resume that really caught my eye: ‘PSIA level II Ski Instructor: Possess an uncanny ability to coax terrified beginners down steep slopes and back onto the chair lift.’ Scott has been all that and more with regard to helping users with our software, and he’s filled a lot of gaps in our development process I didn’t even realize we had. We’re so glad Scott is part of our team!”
Thanks Scott for all your hard work! We have a lot of fun hanging out with you, and you inspire us with your athletic passions and drive to make this company better.
Feb 4, 2016
Last month I had the pleasure of attending Chimiométrie XVII. This installment ran from January 17-20 in the beautiful city of Namur, BELGIUM. The conference was largely in French but with many talks and posters in English. (My French is just good enough that I can get the gist of most of the French talks if the speakers put enough text on their slides!) There were many good talks and posters demonstrating a lot of chemometric activity in the French speaking world.
I was pleased to see evidence of EVRI software in many presentations and posters. I particularly enjoyed “An NIRS Prediction Engine for Discrimination of Animal Feed Ingredients” by Aitziber Miguel Oyarbide working with the folks at AUNIR. This presentation was done with Prezi which I find quite refreshing. I also enjoyed posters about standardization in milk analysis, determination of post mortem interval, evaluation of pesticide coating on cereal seeds, and sorting of archeological material. All of these researchers used PLS_Toolbox, MIA_Toolbox or Solo to good effect.
EVRI was also proud to sponsor the poster contest which was won by Juan Antonio Fernández Pierna et al. with “Chemometrics and Vibrational Spectroscopy for the Detection of Melamine Levels in Milk.” For his efforts Juan received licenses for PLS_Toolbox and MIA_Toolbox. Congratulations! We wish him continued success in his chemometric endeavors!
Finally I’d like to thank the organizing committee, headed by Pierre Dardenne of Le Centre wallon de Recherches agronomiques. The scientific content was excellent and, oh my, the food was fantastic! I’m already looking forward to the next one!
Oct 22, 2014
Model_Exporter is EVRI’s software for turning multivariate/chemometric models into formats which can be compiled into online applications. It offers an alternative to our stand-alone prediction engine Solo_Predictor. Model_Exporter allows users of our MATLAB® based PLS_Toolbox and stand-alone Solo to easily create a numerical recipes of their models. These recipes give the step by step procedure that take a measurement and calculate the desired outputs, such as concentration, class assignment, prediction diagnostics, etc. This includes applying all preprocessing steps along with the model (PCA, PLS, PLS-DA etc.) itself. When Model_Exporter is installed, models can be exported into predictor files in a variety of formats via the file menu in the Analysis window as shown below.
Model_Exporter also includes two versions of the freely-distributable Model_Interpreter. Either the C# or Java version of the Model_Interpreter can be used by any 3rd party program to add the ability to parse an exported model in XML format. Simply point the interpreter at an XML exported model and supply the data from which to make a prediction. The interpreter applies the model and returns the results. Model_Interpreter has no licensing fees and is appropriate for use on standard processors and operating systems or on handheld devices run by reduced instruction set processors (e.g. ARM). Your application doesn’t need to know anything about the preprocessing or model being used.
Version 3.0 of Model_Exporter was released in early October along with its associated stand-alone Solo+Model_Exporter version 7.9. This release includes support for Support Vector Machine (SVM) regression and classification models as well as Artificial Neural Network (ANN) regression models.
These changes represent a significant addition to Model_Exporter making it even more unique in the chemometrics world. No other chemometric modeling product offers anything as transparent, flexible or unencumbered by licensing. You can get more info about Model_Exporter by consulting the Release Notes and the Model_Exporter Wiki page.
Users with current maintenance can access these versions now from their account. If expired, maintenance can be renewed through the “Purchase” tab.
If you have any questions, feel free to write us at email@example.com.
Oct 7, 2014
The MathWorks released MATLAB R2014b (version 8.4) last week, and right on its heels we released PLS_Toolbox 7.9. R2014b has a number of improvements that MATLAB and PLS_Toolbox users will appreciate, specifically with graphics. The new MATLAB is more aesthetically pleasing to the eye, easier for the Color Vision Deficiency (CVD) challenged, and smoother due to better anti-aliasing. An example is shown below where the new CVD-friendly Parula color map is used to indicated the Q-residual values of the samples.
But the most significant changes in R2014b are really for people (like us) that program in MATLAB. For instance, TMW didn’t just change the look of the graphics, they actually changed the entire handle graphics system to be object oriented. They also added routines useful in big data applications, and improved their handling of date and time data. When you start the new MATLAB the command window greets you with this:
“Some existing code may need to be revised to work in this version of MATLAB.” That is something of an understatement. In fact, R2014b required the update of almost every interface from PLS_Toolbox 7.8. Revising our code to work with R2014b required hundreds of hours. But the good news for our users is that we were ready with PLS_Toolbox 7.9 when R2014b was released AND, as always, we made our code work with previous versions of MATLAB (back to R2008a). This, of course, is the significant difference between a supported commercial product and freeware. Not only do you get new features regularly, but you can rely on it being supported as operating systems and platforms change.
So if you look at the Version 7.9 Release Notes, you won’t see a lot of major changes. Instead, we took the time to assure compatibility with R2014b and made many minor changes to improve usability and stability.
The new MATLAB will allow our command-line and scripting users to do their science more efficiently and present their result more elegantly. These improvements will benefit us as well, and will ultimately translate into continued improvement in PLS_Toolbox and Solo.
Jan 6, 2014
On New Year’s day 2014 Eigenvector Research, Inc. (EVRI) celebrated its 19th birthday and began its 20th year. The momentum that carried us into 2013 built throughout the year and resulted in our largest year-over-year software sales increase since 2007. Our best three software sales months ever have all been within the last five months. Clearly our partnering with analytical instrument makers and software integrators plus our tools for putting models on-line are striking a responsive chord with users.
The consulting side of our business also continues to be very busy as we assist our clients to develop analytical methods in a wide variety of applications including medical, pharmaceutical, homeland security (threat detection), agriculture, food supplements, energy production and more.
The third leg of our business, chemometrics training, continued unabated as we taught on-site courses for government and industry, courses at conferences and held the 8th edition of our popular Eigenvector University (EigenU). We enter 2014 firing on all cylinders!
Major additions to PLS_Toolbox and Solo in 2013 included the Model Optimizer, Hierarchical Model Builder, a new Artificial Neural Network (ANN) tool, and several new file importers. We will soon release an additional ANN option along with new tools for instrument standardization/calibration transfer. Also on the horizon, a major new release of Solo_Predictor will include an enhanced web interface option and additional instrument control and scripting options.
2014 includes a busy schedule with conferences, talks, conference exhibits and short courses. Below is a listing of where you’ll be able to find us:
- January 21-24, IFPAC, Arlington, VA. BMW to present “Mixed Hierarchical Models for the Process Environment” and “A Multivariate Calibration Model Maintenance Road Map.”
- March 2-6, Pittcon Chicago, IL. NBG and RTR will be at the EVRI exhibition booth.
- April 27-May 2, EigenU 2014, 9th Annual Eigenvector University, Seattle, WA. Join the complete EVRI staff for 6 days of courses and events.
- May 6-9, EuroPACT, Barcelona, Spain. BMW to give plenary address “Model Maintenance: the Unrecognized Cost in PAT and QbD” and a condensed version of our “Chemometrics without Equations” short course.
- June 1-4, CMA4CH, Taormina, Italy. JMS to teach short course and talk TBD.
- June 8-12, CAC-XIV, Richmond, VA. NBG and RB to teach “Advanced Preprocessing for Spectroscopic Applications” and “Alternative Modeling Methods in Chemometrics.”
- August 2-8, IDRC, Chambersburg, PA. NBG to attend, talk TBD.
- September 14-18, ICRM, Nijmegen, The Netherlands. NBG to give keynote “An Overview of Hyperspectral Image Analysis in Chemometrics.”
- September 28-October 3, SciX 2014, Reno, NV. JMS Chemometrics Section Chair, talks and courses TBD.
- November 10-13, EigenU Europe, Hillerød, Denmark. Courses led by BMW and Eigenvector Associate Rasmus Bro.
- November 17-19, EAS 2014, Somerset, NJ. EVRI sponsor of Award for Achievements in Chemometrics. Courses and talks TBD.
We’re especially excited about this year’s Eigenvector University. This ninth edition of EigenU will include all our usual events (poster session, PowerUser Tips & Tricks, workshop dinner) plus five new short courses. Special guest Age Smilde will lead “Chemometrics in Metabolomics” and Rasmus Bro will present “Modeling Fluorescence EEM Data.” The other three new courses are “Calibration Model Maintenance,” “PLS_Toolbox Beyond the Interfaces” and “Getting PLS_Toolbox/Solo Models Online.” We expect EigenU 2014 to be an especially fun and fruitful learning experience.
We look forward to working with you in 2014!
Jan 24, 2013
One of the challenges of writing software that works with MATLAB is accommodating an array of versions. For better or worse, not everybody updates their MATLAB regularly. So we have to make our PLS_Toolbox and other toolboxes work with a fairly wide distribution of MATLABs.
To give you some idea of what our developers are up against, the plot below shows the distribution of MATLAB versions among our users for each of the last three years. (Click on the plot to get a much larger .pdf version.)
While the most common version in use at any one time tends to be one of the latest two or three releases, it never peaks at more than 20% of our users. And there are LOTS of users with older versions of MATLAB. Note that the plot goes back ten years to 2003! In 2010, we still had 12% of our users with MATLAB versions from 2005 or earlier. It was only after that dropped to less than 5% that we stopped supporting MATLAB 6.5 and 7.0.1 in our new releases. As shown in our release notes, we currently support MATLAB 7.0.4 (from early 2005) through the current MATLAB 8.0 (R2012b). And with our latest minor update (PLS_Toolbox 7.0.3) we’re ready for R2013a, so you’ll be set when it comes out.
But it is a balancing act. We don’t want to force users to upgrade their MATLAB. We understand that an older version of MATLAB works perfectly well for many users. But often we can’t take advantage of newer MATLAB features until we cut old versions loose. As an example, it would be much easier for our developers to use the newer format for coding objects (such as our DataSet Object) that became available in MATLAB 2008a. Until recently, however, 10% of our users were still working with MATLAB 2007b or older.
Our Chief of Technology Development Jeremy M. Shaver notes: Moving users to later versions of MATLAB allows us to utilize better graphical interface tools (making our interfaces easier to use and more powerful), modern hardware architecture (allowing faster processing and better memory management), and other new programming functionality (making the code easier for us to support and for our power-users to understand). Plus, having fewer MATLAB versions to support means we have fewer “special cases” to support in the code. We balance this against our user’s inconvenience and cost in order to achieve the best overall result for our customers!
Well said, Jeremy!
Jan 10, 2013
I got an email from a prospective user of our software the other day that really set me back. Paraphrasing a bit here, it was “Are there any unique features of your PLS algorithm/diagnostics?” The problem with questions like this one is that I never know where to start. But here is what I wrote.
As for “unique features of your pls algorithm,” well, there are numerous ways to calculate a PLS model, but they all pretty much arrive at the same result (which is good). If you’d like to learn more about PLS algorithms and their accuracy, I suggest you have a look at a series of blog posts I did on the subject. See:
As to diagnostics, most of the packages use pretty much the same diagnostics, though sometimes they call them by different names. Usually there is a sample distance metric (e.g. T2) and some sort of residual (e.g. Q).
But maybe what you are really looking for is what makes our software unique, rather than our specific PLS algorithm. We have two major packages for chemometrics. The first is our MATLAB-based PLS_Toolbox, the second is our stand-alone product Solo, which is essentially the compiled version of PLS_Toolbox. The two packages provide identical interfaces and share the same model and data formats. The advantage of PLS_Toolbox is that, because it works within the MATLAB environment, it can be run from the command line and functions from it can be incorporated into other analyses. The advantage of Solo is that you don’t have to have MATLAB.
So right off the bat, a unique feature of our software is that there are completely compatible solutions for working with or without MATLAB. And both of these solutions are available on all platforms, including Windows, Mac OSX and Linux. That is unique.
PLS_Toolbox and Solo have the widest available array of analysis methods. This includes PLS and PCA of course, but also PCR, MLR, MCR, PARAFAC, N-PLS, PLS-DA, SIMCA, SVM, KNN, CLS, LWR, MPCA, Cluster Analysis and Batch Maturity. Plus they have a large number of auxiliary tools for Instrument Standardization, Data Transformation, Dynamic Modeling, Sample Selection, Trend Analysis, Correlation Spectroscopy and Design of Experiments. And numerous tools for variable selection including Genetic Algortihm, iPLS and Stepwise MLR. Plus diagnostic methods such as VIP and Selectivity Ratio. The collection of all of these analysis methods and auxiliary functions with one interface is unique.
PLS_Toolbox and Solo can be extended for use with Multivariate Images with MIA_Toolbox and Solo+MIA. The ability to apply such a wide array of multivariate analysis techniques to images is unique. There is also an add-on for the patented Extended Multiplicative Scatter Correction, EMSC_Toolbox. If not completely unique, this method for preprocessing data from highly scattering samples is not widely available.
For on-line application there is our Solo_Predictor and Model_Exporter. Solo_Predictor can be used with any model generated by PLS_Toolbox/Solo and can communicate via TCP/IP sockets, ActiveX, .NET, timed action or wait-for-file. Model_Exporter translates PLS_Toolbox/Solo models into mathematical formulas that can be compiled into other languages. Model_Exporter’s XML output can be parsed for execution in .NET (C#). Additional output formats include MATLAB .m file (compatible with older versions of MATLAB and OCTAVE, plus LabView, Symbion and Tcl). This wide array of on-line options is unique.
Beyond that, PLS_Toolbox and Solo are also extremely flexible tools and include the widest array of data preprocessing methods with user-specified ordering, ability to add user-specified method, and customizable favorites settings.
And finally, price. PLS_Toolbox is only $1395 for industrial users, $395 for academic. Solo is $2195/$695. The price/performance ratio of these products is most certainly unique.
If you have any questions about the specific functionality of our software, please write me.
Nov 8, 2012
Eigenvector’s Chief of Technology Development Dr. Jeremy Shaver is getting ready to head off to the Eastern Analytical Symposium (EAS). He’ll be busy on Sunday and Monday assisting Eigenvector Associate Dr. Don Dahlberg with Chemometrics without Equations (CWE). As I wrote previously, this year the popular CWE is being extended by a day to cover advanced data preprocessing. Jeremy will be demonstrating the methods using the recently released PLS_Toolbox/Solo 7.0. If you’d like to attend, there is still time to register through the conference web site!
Jeremy will also represent EVRI at the session honoring Professor Dr. Lutgarde Buydens of Radboud University Nijmegen for Outstanding Achievements in Chemometrics. The award is, once again, sponsored by Eigenvector Research. The award session, chaired by University of Barcelona’s Dr. Anna de Juan, will start Monday morning at 9:00am.
You might also find Dr. Shaver at the Cobalt Light Systems Ltd booth. Cobalt, one of EVRI’s Technology Partners, develops tools for non-invasive analysis. Their TRS100 pharmaceutical analysis instrument utilizes our Solo software for chemometric modeling. Jeremy will be there to advise users on how to best calibrate the system for their particular needs.
Of course, if you can catch him, Jeremy would be happy to talk to anyone interested in EVRI’s software offerings! He’s the Eigenvectorian most intimately familiar with our products and their features and capabilities. Drop Dr. Shaver an email if you’d like to meet him at EAS.
Have a good week!
Oct 30, 2012
New versions of our MATLAB-based PLS_Toolbox and MIA_Toolbox were released earlier this month, along with updates to our stand-alone packages Solo and Solo+MIA. PLS_Toolbox and its derivatives, Solo and Solo+MIA, are now in version 7.0, while MIA_Toolbox is in version 2.8. As can be seen in the release notes, the list of enhancements and additions is long (as usual!).
Many of the new features are demonstrated in the new EigenGuide video, “What’s New in Version 7.0.” The video illustrates the use of:
- additional information in the Analysis interface, such as error of cross-validation
- interfaces for splitting data sets into calibration and validation sets
- tools for visualizing the difference between samples for both their Q residuals and T2 contributions
- simplified control of plot attributes
- readily available class statistics
- automated peak finding
- tools for finding specific samples and variables based on logical operators
Of particular note in this release is the expansion of the Batch Process Modeling tools. The Batch Processor tool readies data sets for modeling by Summary PCA, Batch Maturity, MPCA, and several PARAFAC variants. It then pushes the data sets into the Analysis tool where the models are developed. To see the Batch Processor and Analysis in action, watch the video. The combination of the Batch Processor and methods supported in the Analysis interface allows modelers to follow most of the pathways outlined in my TRICAP 2012 talk, “Getting to Multiway: A Roadmap for Batch Process Data.”
This release reaffirms EVRI’s commitment to continuous software improvement – it completes our fifth year of semiannual major releases. The best chemometrics software just keeps getting better!
Sep 17, 2012
Autumn is nearly here and with it comes the first semester of the school year. This morning I was greeted by a sign of fall: a slew of student account creation notifications, drifting into my email inbox like falling leaves.
At EVRI, we work with professors to make our software freely available to students enrolled in chemometrics courses. Students can get free 6-month demo licenses of our MATLAB-based PLS_Toolbox and MIA_Toolbox, or stand-alone Solo or Solo+MIA.
I traded emails this morning with Professor Anna de Juan of the Universitat de Barcelona. This will be the third year we’ve worked with Prof. de Juan on her chemometrics class. She wrote, “we had a very good experience the two semesters using PLS_Toolbox in the classroom. The students were happy and only a pair of them had problems of installation at home. It really made it easy that they could play with proposed data sets out of the classroom, at their own pace and exploring many possibilities.” de Juan noted her students typically use Solo at home because they generally don’t have personal copies of MATLAB.
Another advantage of Solo is that it is available for multiple platforms, including Windows, Linux and Mac OS X. I see lots of Apple laptops on college campuses. A search of the web reveals estimates ranging from 30-70% of college students using Macs. We see a substantial number of student downloads of our Solo for Mac, and expect that number to grow.
Interested in teaching Chemometrics? Drop me a line and we’ll be happy to work with you to provide software for your students.
Mar 20, 2012
Updates to our flagship PLS_Toolbox and Solo were released last week; they are now in version 6.7. This is in keeping with our policy, (began in 2008), to release significant upgrades twice yearly. Our Multivariate Image Analysis (MIA) tools were also updated with the release of Solo+MIA 6.7 and MIA_Toolbox 2.7.
As the Version 6.7 Release Notes show, the number of additions, improvements and refinements is (once again!) rather long. My favorite new features are the Drag and Drop import of data files, Confusion Table including cross-validation results for classification problems, and Custom Color-By values for plotting.
PLS_Toolbox/Solo can import a wide variety of file types, and the list continues to grow. Drag and Drop importing allows users to drag their data files directly to the Browse or Analysis windows. They will be loaded and ready for analysis. For instance, users can drag a number of .spc files directly into Analysis. Forget some files or have additional files in a different directory? Just drag them in and they will be augmented onto the existing data.
The Confusion Table feature creates several tables summarizing the classification performance of models. This includes a “confusion matrix” giving fractions of true positive, false positive, true negative, and false negative samples and a confusion table which gives number of samples the actual and predicted classes. Tables are calculated for both the full fitted model and for the cross-validation results. The tables can be easily copy and pasted, saved to file, or can be included in the Report Writer output as html, MS Word or PowerPoint files.
With Custom Color-By users can color points in scores and loadings plots using any currently loaded data or with new data loaded from the workspace. For instance, samples in a PLS LV-2 versus LV-1 scores plot can be colored by the scores on another LV, their actual or predicted y values, leverage, Q residual, specific X-variable, additional Y-variable, or any custom variable from the work space. The allows deeper investigation into the cause of specific variations seen in the data.
Want to find out more about our latest releases? Create an account in our system and you’ll be able to download free 30-day demos. Want prices? No need to sit through a webinar! Just check our price list page, which includes all our products. Just click Academic or Industrial.
As always, users with current Maintenance Agreements can download the new versions from their accounts.
Questions? I’d be happy to answer them or refer you to our development team. Just email me!
Dec 6, 2011
Our Chief of Technology Development Jeremy M. Shaver received a very nice letter this morning from Balázs Vajna, who is a Ph.D. student at Budapest University of Technology and Economics. As you’ll see from the references below, he is a very productive young man! Here is his letter to Jeremy, highlighting how he used PLS_Toolbox in his work:
I would like to thank you for all your help with the Eigenvector products. With your help, I was able to successfully carry out detailed investigations using chemical imaging and chemometric evaluation in such a way that I could publish these results in relevant international journals. I would like to draw your attention to the following publications where (only) PLS_Toolbox was used for chemometric evaluation:
- B. Vajna, I. Farkas, A. Farkas, H. Pataki, Zs. Nagy, J. Madarász, Gy. Marosi, “Characterization of drug-cyclodextrin formulations using Raman mapping and multivariate curve resolution,” Journal of Pharmaceutical and Biomedical Analysis, 56, 38-44, 2011.
- B. Vajna, H. Pataki, Zs. Nagy, I. Farkas, Gy. Marosi, “Characterization of melt extruded and conventional Isoptin formulations using Raman chemical imaging and chemometrics,” International Journal of Pharmaceutics, 419, 107-113, 2011.
These may be considered as showcases of using PLS_Toolbox in Raman chemical imaging, and – which is maybe even more interesting in the light of your collaboration with Horiba Jobin Yvon – the joint use of PLS_Toolbox and LabSpec. The following studies have also been published where MCR-ALS and SMMA (Purity) were carried out with PLS_Toolbox and were tested along with other curve resolution techniques.
- B. Vajna, G. Patyi, Zs. Nagy, A. Farkas, Gy. Marosi, “Comparison of chemometric methods in the analysis of pharmaceuticals with hyperspectral Raman imaging,” Journal of Raman Spectroscopy, 42(11), 1977-1986, 2011.
- B. Vajna, A. Farkas, H. Pataki, Zs. Zsigmond, T. Igricz, Gy. Marosi, “Testing the performance of pure spectrum resolution from Raman hyperspectral images of differently manufactured pharmaceutical tablets,” Analytica Chimica Acta, in press.
- B. Vajna, B. Bodzay, A. Toldy, I. Farkas, T. Igricz, G. Marosi, “Analysis of car shredder polymer waste with Raman mapping and chemometrics,” Express Polymer Letters, 6(2), 107-119, 2012.
I just wanted to let you know that these publications exist, all using PLS_Toolbox in the evaluaton of Raman images, and that I am very grateful for your help throughout. I hope you will find them interesting.
Department of Organic Chemistry and Technology
Budapest University of Technology and Economics
8 Budafoki str., H-1111 Budapest, Hungary
Thanks, Balázs, your letter just made our day! We’re glad you found our tools useful!
Nov 21, 2011
Although it was shown previously that PCA can be used to perfectly impute missing values in rank deficient, noise free data, it’s not hard to guess that PCA might be suboptimal with regards to imputing missing elements in real, noisy data. The goal of PCA, after all, is to estimate the data subspace, not predict particular elements. Prediction is typically the goal of regression methods, such as Partial Least Squares. In fact, regression models can be used to construct estimates of any and all variables in a data set based on the remaining variables. In our 1989 AIChE paper we proposed comparing those estimates to actual values for the purpose of fault detection. Later this became known as regression adjusted variables, as in Hawkins, 1991.
There is a little known function in PLS_Toolbox, (since the first version in 1989 or 90), plsrsgn, that can be used to develop collections of PLS models, where each variable in a data set is predicted by the remaining variables. The regression vectors are mapped into a matrix that generates the residuals between the actual and predicted values in much the same way as the I–PP‘ matrix from PCA.
We can compare the results of using these collections of PLS models to using the PCA done previously. Here we created the coeff matrix using (a conservative) 3 LVs in each of the PLS submodels. Each sub model could of course be optimized individually, but for illustration purposes this will be adequate. The reconstruction error of the PLS models is compared with PCA in the figure shown at left, where the error for the collection of PLS models is shown in red, superimposed over the reconstruction via the PCA model error, in blue. The PLS models’ error is lower for each variable, in some cases, substantially, e.g. variables 3-5.
The second figure, at left, shows the estimate of variable 5 for both the PLS (green) and PCA (red) methods compared to the measured values (blue). It is clear that the PLS model tracks the actual value much better.
Because the estimation error is smaller, collections of PLS models can be much more sensitive to process faults than PCA models, particularly individual sensor faults.
It is also possible to replace missing variables based on these collections of PLS models in (nearly) exactly the same manner as in PCA. The difference is that, unlike in PCA, the matrix which generates the residuals is not symmetric, so the R12 term (see part one) does not equal R21‘. The solution is to calculate b using their average, thus
b = 0.5(R12 + R21‘)R11-1
Curiously, unlike the PCA case, the residuals on the replaced variables will not be zero except in the unlikely case that R12 = R21‘.
In the case of an existing single PLS model, it is of course possible to use this methodology to estimate the values of missing variables based on the PLS loadings. (Or, if you insist, on the PLS weights. Given that residuals based on weights are larger than residuals based on loadings, I’d expect better luck reconstructing from the loadings but I offer that here without proof.)
In the next installment of this series, we will consider the more challenging problem of building models on incomplete data records.
B.M. Wise, N.L. Ricker, and D.J. Veltkamp, “Upset and Sensor Failure Detection in Multivariate Pocesses,” AIChE Annual Meeting, 1989.
D.M. Hawkins, “Multivariate Quality Control Based on Regression Adjusted Variables,” Technometrics, Vol. 33, No. 1, 1991.
Nov 11, 2011
In Missing Data (part one) I outlined an approach for in-filling missing data when applying an existing Principal Components Analysis (PCA) model. Let us now consider when this approach might be expected to fail. Recall that missing data estimation results in a least-squares problem with solution:
xb = –xgR21R11-1
In our short courses, I advise students to be wary any time a matrix inverse is used, and this case is no exception. Inverses are defined only for matrices of full rank, and may be unstable for nearly rank-deficient matrices. So under what conditions might we expect R11 to be rank deficient? Recall that R11 is the part of I–PP‘ that applies to the variables which we want to replace. Problems arise when the variables to be replaced form a group that are perfectly correlated with each other but not with any of the remaining variables. When this happens the variables will either be 1: included as a group in the PCA model (if enough PCs are retained) or 2: excluded as a group (too few PCs retained). In case 1, R11 is rank deficient and the inverse isn’t defined. In case 2, R11 is just I, but the loadings of the correlated group are zero, so the R12 part of the solution is 0. In either case, it makes sense that a solution isn’t possible–what information would it be based on?
With real data, of course, it is highly unlikely that R11 will be rank deficient to within numerical precision (or that R12 will be zero). But it certainly may happen that R11 is near rank deficient, in which case the estimates of the missing variables will not be very good. Fortunately, in most systems the measured variables are somewhat correlated with each other and the method can be employed.
In their 1995 paper, Nomikos and MacGregor estimated the value of missing variables using a truncated Classical Least Squares (CLS) formulation. The PCA loadings are fit to the available data, leaving out the missing portions, to estimate scores which are then used to estimate missing values. This reduces to:
xb = xg(PgPg‘)-1PgPb‘
where Pb and Pg refer to the part of the PCA model loadings for the missing (bad) and available (good) data, respectively. In 1996 Nelson, Taylor and MacGregor noted that this method was equivalent to the method in our 1991 paper but offered no proof. The proof can be found in “Refitting PCA, MPCA and PARAFAC Models to Incomplete Data Records” from FACSS, 2007.
So how does this work in practice? The topmost figure shows the estimation error for each of the 20 variables in the melter data based on a 4 PC models with mean-centering. The model was estimated with every other sample and tested on the other samples. The estimation error is shown in units of Relative Standard Deviation (RSD) to the raw data. Thus, the variables with error near 1.0 aren’t being predicted any better than just using the mean value, while the variables with error below 0.2 are tracking quite well. An example is shown in the middle figure, which shows temperature sensor number 8 actual (blue line) and predicted (red x) for the test set as a function sample number (time).
The reason for the large differences in ability to replace variables in this data set is, of course, directly related to how independent the variables are. A graphic illustration of this can be produced with the PLS_Toolbox corrmap function, which produced the third figure. The correlation matrix for the temperatures is colored red where there is high positive correlation, blue for negative correlation, and white for no correlation. It can be seen that variables with low estimation error (e.g. 7, 8, 17, 18) are strongly correlated with other variables, whereas variables with high estimation error (e.g. 2, 12) are not correlated strongly with any other variables.
To summarize, we’ve shown that missing variables can be imputed based on an existing PCA model and the available measurements. This success of this approach depends upon the degree to which the missing variables are correlated with available variables, as might be expected. In the next installment of this Missing Data series, we’ll explore using regression models, particularly Partial Least Squares (PLS) to replace missing data.
P. Nomikos and J.F. MacGregor, “Multivariate SPC Charts for Monitoring Batch Processes,” Technometrics, 37(1), pps. 41-58, 1995.
P.R.C. Nelson, P.A. Taylor and J.F. MacGregor, “Missing data method in PCA and PLS: Score calculations with incomplete observations,” Chemometrics & Intell. Lab. Sys., 35(1), pps. 45-65, 1996.
B.M. Wise, “Re-fitting PCA, MPCA and PARAFAC Models to Incomplete Data Records,” FACSS, Memphis, TN, October, 2007.
Nov 5, 2011
Over the next few weeks I’m going to be discussing some aspects of missing data. This is an important aspect of chemometrics as many applications suffer from this problem. Missing data is especially common in process applications where there are many independent sensors.
I got interested in missing data while in graduate school in the late 1980s. I worked a lot with a prototype glass melter for the solidification of nuclear fuel reprocessing waste. The primary measurements were temperatures provided by thermocouple sensors. The very high temperatures in this system, nearing 1200C (~2200F), caused the thermocouples to fail frequently. Thus it was common for the data record to be incomplete.
Missing data is also common in batch process monitoring. There are several approaches for building models on complete, finished batches. However, it is most useful to know if batches are going wrong BEFORE they are complete. Thus, it is desirable to be able to apply the model to an incomplete data record.
Missing data problems can be divided into two classes: 1)those involving missing data when applying an existing model to new data records, and 2) those involving building a model on an incomplete data record. Of these, the first problem is by far the easiest to deal with, so we will start with it. It will, however, illustrate some approaches which can be modified for use in the second case. These approaches can also be used for other purposes as well, such as cross-validation of Principal Component Analysis (PCA) models.
Consider now the case where you have a process that periodically produces a new data vector xi (1 x n). With it you have a validated PCA model, with loadings Pk (n x k). The residual sum-of-squares or Q statistic, can be calculated for the ith sample as Q = xiRxi‘ where R = I–PkPk‘. For the sake of convenience, imagine that the first p variables in this model are no longer available, but the remaining n–p variables are as usual. Thus, x can be partitioned into a group of bad variables xb and a group of good variables xg, x = [xb xg]. The calculation of Q can then be broken down into parts which do and do not involve missing variables:
Q = xbR11xb‘ + xgR21xb‘ + xbR12xg‘ + xgR22xg‘
where R11 is the upper left (p x p) part of R, R12 = R21‘ is the lower left (n–p x p) section, and R22 is the lower right (n–p x n–p) section.
It is possible to solve for the values of the bad variables xb that minimize Q, as shown in our 1991 paper referenced below. The (incredibly simple) solution is
xb = –xgR21R11-1
Unsurprisingly, the residuals on the replaced variables on the full model will be zero.
It is easy to demonstrate that this method works perfectly in the rank deficient, no noise case. In MATLAB, you can create a rank 5 data set with 20 variables, then use the Singular Value Decomposition (SVD) to get a set of PCA loadings P, and from that, the R matrix.
>> c = randn(100,5);
>> p = randn(20,5);
>> x = c*p’;
>> [u,s,v] = svd(x);
>> P = v(:,1:5);
>> R = eye(20)-P*P’;
Now let’s say the sensor associated with variable 5 has failed. We can use the replace function to generate a matrix Rm which replaces it based on the values of the other variables.
>> Rm = replace(R,5,’matrix’);
>> imagesc(sign(Rm)), colormap(rwb)
Rm has the somewhat curious structure show in the figure above. The white area is zeros, the diagonal is ones, and R21R11-1 for the appropriately rearranged R is mapped into the vertical section.
We can try Rm out on a new data set that spans the same space as the previous one, and plot up the results as follows:
>> newx = randn(100,5)*p’;
>> var5 = newx(:,5);
>> newx(:,5) = 0;
>> newx_r = newx*Rm;
>> plot(var5,newx_r(:,5),’+b’), dp
The (not very interesting) figure at left shows that the replaced value of variable 5 agrees with the original value. This can be done for multiple variables.
In the second installment of this Missing Data series I’ll give some examples of how this works in practice, discuss limitations, and show some alternate ways of estimating missing values. In the third installment we’ll get to the much more challenging issue of building models on incomplete data sets.
B.M. Wise and N.L. Ricker, “Recent advances in Multivariate Statistical Process Control, Improving Robustness and Sensitivity,” IFAC Symposium n Advanced Control of Chemical Processes, pps. 125-130, Toulouse, France, October 1991.
Nov 2, 2011
Eigenvector Vice-president Neal B. Gallagher and Chief of Technology Development Jeremy M. Shaver will present Using the Advanced Features in PLS_Toolbox/Solo 6.5 in New Brunswick, NJ on December 8-9, 2011. The course will be held at the Hyatt Regency.
With PLS_Toolbox and Solo Version 6.5 released last month, this is an opportune time to attend this course. Participants will learn how to take advantage of many of the recently added tools. It will also be a great time to ask “how to” type questions. Nobody knows our software more intimately than Jeremy, as he is responsible for its overall development. He’s constantly surprising the rest of us EigenGuys by showing us easier ways to accomplish our modeling tasks using features we didn’t know existed! Neal will be on hand to guide users through many of the methods, particularly the advanced preprocessing features. Neal has extensive experience in this area due to his work with remote sensing applications.
The course includes an optional second half day which covers our tools for Multivariate Image Analysis and Design of Experiments. There will also be time for one-on-one consulting with the software. Attendees are encouraged to bring their own data for this! Often all the methods and tools make a lot more sense when applied to data with which you are familiar.