Author Archives: Barry M. Wise

About Barry M. Wise

Co-founder, President and CEO of Eigenvector Research, Inc. Creator of PLS_Toolbox chemometrics software.

I hope this isn’t a trend

Jul 11, 2011

On April 21, I received an announcement and call for papers from the organizers of ICRM 2011. The announcement noted that the scientific committee would start to select the abstracts for oral and poster presentations starting May 1. On April 28 I submitted an abstract for an oral presentation and received an automated reply noting that a decision would be made by July 1.

On May 9 I received a note from the conference chair, which stated, “Only abstracts of those who have registered by May 20th and paid their fee within 30 days of registration (and at least before June 20th) will be considered by the scientific committee for acceptance.” When I asked the organizers what would happen if I registered for the conference, and then decided to withdraw if my paper was not accepted, the reply included the cancellation policy, which states: “Registrations cancelled after July 1st 2011 or for no-shows at the conference will remain payable at full charges.”

So, at ICRM, in order to have your paper considered, you must pay the conference registration in full. And if they decide not to accept it, and you decide not to attend, you won’t get a refund.

It seems to the me that the decision to accept a paper should be based on scientific merit, not financial considerations. At ICRM, the scientific committee is actually more of an economic committee. I sincerely hope this isn’t the start of a trend among scientific conferences.

BMW

Solo 6.3 Released

Jun 30, 2011

Version 6.3 of our Solo software products were released this week. This includes the core Solo package, Solo+MIA and Solo+Model_Exporter. This release includes significant improvements to underlying libraries to make Solo faster, more memory friendly, make better use of multi-core processors, and have a more modern look and feel. Details can be found on the Solo 6.3 Release Notes page.

Solo+Model_Exporter 6.3 incorporates the latest Model_Exporter 2.5 which adds support for more preprocessing methods, improved memory support, and includes the freely-distributable .NET prediction engine (Model_Exporter Interpreter) for 3rd party integration. For detailed information, please see the Model_Exporter 2.5 Release Notes.

I asked our Chief of Technology Development, Jeremy M. Shaver, a few questions about the release.

BMW: Can Solo users expect to see performance gains?

JMS: If you use a system with multiple cores, or regularly work with data that is graphically-intensive or generally large, you should expect to see performance gains. If you are using Windows Vista or Windows 7, you can expect to see interfaces which are more consistent with the operating system’s overall look and feel.

BMW: What does this mean for people trying to get models on-line?

JMS: The changes to Solo+Model_Exporter include more support for models with complex preprocessing and improved memory handling. It also includes the new freely distributable .NET object Model_Exporter Interpreter. This object can be integrated into third party .NET applications to easily get exported Solo models on-line. There is no royalty payments associated with either exporting a model or distributing the Interpreter.

BMW: Any impact for hand-held devices?

JMS: The Model_Exporter improvements in memory management also make exported Solo models more hand-held device friendly. In addition, the Model_Exporter Interpreter also makes integrating these models into existing applications much easier.

For more information about what’s been added to Solo (and PLS_Toolbox) in recent releases please see the PLS_Toolbox/Solo Release Notes.

Solo users with current maintenance contracts can upgrade now. Free demos are also available to those with accounts in our system. Don’t have an account? Start by creating one.

BMW

Rasmus Bro Wins Wold Medal

Jun 9, 2011

The 10th Wold Medal was presented at the Scandinavian Symposium on Chemometrics, SSC12, in Billund, Denmark, this evening. Professor Rasmus Bro accepted the award from past winner Johan Trygg of Umeå University. Rasmus credited Lars Munck for development of the group at University of Copenhagen and enabling him to flourish there. He also mentioned the late Sijmen de Jong as being one of the great people he worked with and dedicated the award to him.

The Wold Medal, named for Herman Wold, father of Partial Least Squares (PLS) methods, honors individuals who have contributed greatly to the field of chemometrics. Rasmus is a worthy recipient, having published over 100 papers in the field, particularly in multi-way analysis. Prof. Bro is also know for his teaching, MATLAB-based multi-way software, and as being a great ambassador for the field. Rasmus is shown below with the Wold medal.

Past recipients of the Wold medal include Svante Wold, Agnar Höskuldsson, Harald Martens, John MacGregor, Rolf Carlson, Olav Kvalheim, Pentti Minkinen, Michael Sjöström and Johan Trygg.

Congratulations Rasmus! Well deserved!

BMW

EigenU Poster Session Winners

May 25, 2011

The Sixth Annual Eigenvector University was held last week, and with it, our Tuesday evening PLS_Toolbox User Poster Session. As always, the poster session allowed EigenU participants to discuss chemometric applications over hors d’oeuvres and drinks.

This year’s winners were Kellen Sorauf of University of Denver and Christoph Lenth of Laser-Laboratorium Göttingen e.V. Kellen, shown below receiving his iPod nano, presented Distribution Coefficient of Pharmaceuticls on Clay Mineral Nanoparticles Using Multivariate Curve Resolution with co-authors Keith E. Miller and Todd A. Wells.

Christoph, shown below, presented Qualitative trace analytics of proteins using SERS with co-authors W. Huttner, K. Christou and H. Wackerbarth.

Poster session winners received the traditional iPod nano, (16G), engraved with EigenU 2011 Best Poster.

Thanks to everyone who attended and presented!

BMW

Ready for EigenU 2011

May 14, 2011

The Sixth Annual Eigenvector University starts tomorrow, May 15. The computers are set up, the catering is finalized and the notes are printed. We’re ready to go! I’m always glad when we get to this point. The prep work gets tedious sometimes but I always enjoy the classes.

Once again we’ll have 40+ participants from North America and Europe. We’re looking forward to some good interaction with a diverse set of students. I see we have enrollees from the pharmaceutical industry, medical device companies, national laboratories, food products, mining, and instrument manufacturers. And of course a strong contingent of academicians. Should be a good mix!

Perhaps my favorite parts of teaching is when somebody asks a really good question. One that makes you think hard and delve into the background of the methods we work with. Hopefully, we’ll get some of those–they provide good fodder for this blog!

I’ll report back this week with any interesting developments.

BMW

Eigenvectorians Hit the Road

May 6, 2011

The next six months promise to be a busy time for the Eigenvectorians. We’ll be hitting the road to eight conferences in the US and Europe. I’ve listed below the conferences we’ll be attending, the talks and courses we’ll be giving, and our exhibits. (Note: not all talks have been accepted yet!)

SSC12, Billund, Denmark, June 7-10

  • A Guide to Orthogonalization Filters Used in Multivariate Image Analysis and Conventional Applications, by Barry M. Wise (BMW), Jeremy M. Shaver (JMS) and Neal B. Gallagher (NBG)
  • We will also offer Advanced Preprocessing at SSC12 on June 7

AOAC PacNW, Tacoma, Washington, June 21-22

  • Simultaneous Analysis of Multivariate Images from Multiple Analytical Techniques, by JMS, and Eunah Lee of HORIBA Scientific
  • We’ll also have an exhibit table at AOAC, so please stop by!

NORM 2011, Portland, Oregon June 26-29

  • A Guide to the Orthogonalization Filter Smörgåsbord, by BMW, JMS and NBG

SIMS XVIII, Riva del Garda, Italy, September 18-23

  • Deconvolving SIMS Images using Multivariate Curve Resolution with Contrast Constraints, by BMW and Willem Windig (WW)

ICRM, Nijmegen, The Netherlands, September 25-29

  • Analysis of Hyperspectral Images using Multivariate Curve Resolution with Contrast Constraints, by BMW and WW

FACSS, Reno, Nevada, October 2-7

  • Multivariate Modeling of Batch Processes Using Summary Variables, by BMW and NBG
  • Detection of Adulterants in Raw Materials Utilizing Hyperspectral Imaging with Target and Anomaly Detection Algorithms, by Lam K. Nguyen and Eli Margilith of Opotek, NBG and JMS
  • A Comparison of Detection Strategies for Solids and Organic Liquids on Surfaces Using Long Wave Infrared Hyperspectral Imaging, by NBG, and Thomas A. Blake and James F. Kelly of PNNL
  • Visualizing Results: Data Fusion and Analysis for Hyperspectral Images, by JMS, Eunah Lee and Karen Gall
  • EVRI will also offer Advanced Preprocessing and Intro to Multivariate Image Analysis at FACSS
  • EVRI will also be at the FACSS Exhibit in Booth #29

AIChE Annual Meeting, Minneapolis, MN, October 16-21

  • Correlating Powder Flowability to Particle Size Distribution using Chemometric Techniques by Christopher Burcham of Eli Lilly and Robert T. Roginski (RTR)

EAS, Somerset, NJ, November 14-17

  • Orthogonalization Filter Preprocessing in NIR Spectroscopy, by BMW, JMS and NBG
  • Eigenvector is once again proud to sponsor the EAS Award for Achievements in Chemometrics, this year honoring Beata Walczak
  • BMW will assist Don Dahlberg with a new short course, Intermediate Chemometrics without Equations
  • EVRI can also be found at the EAS Exposition in Booth #508

Now that you know where we’re going to be, we hope you’ll stop by and say hello!

BMW

iPods Ordered!

Apr 25, 2011

EigenU 2011 is now less than 3 weeks away: it starts on Sunday, May 15. Once again we’ll have the “PLS_Toolbox/Solo User Poster Session” where users can showcase their own chemometric achievements. This Tuesday evening event is open to all EigenU participants, as well as anyone with a poster which demonstrates how PLS_Toolbox or Solo was used in their work. Presenters of the two best posters will receive Apple iPod nanos for their effort. I ordered them today, (16G, one blue, one orange), engraved with EigenU 2011 Best Poster.

Tuesday evening will also include a short User Group meeting. We’ll give a short presentation highlighting the key features in our latest software releases and discuss directions for future development. Here is your chance to give us your wish list!

The poster session will feature complimentary hors d’oeuvres and beverages. This is a great time to mingle with colleagues and the Eigenvectorians and discuss the many ways in which chemometric methods are used.

See you in May!

BMW

Switch from MLR to PLS?

Apr 20, 2011

In a recent post, Fernando Morgado posed the following question to the NIR Discussion List: “When is it necessary to move from traditional Multiple Linear Regression (MLR) to Partial Least Squares Regression (PLS)?” That’s a good discussion starter! I’ll offer my take on it here. I have several answers to the question, but before I get to that, it is useful to outline the differences between MLR and PLS.

MLR, PLS, Principal Components Regression (PCR) and a number of other methods are all Inverse Least Squares (ILS) models. Given a set of predictors, X (m samples by n variables), and a variable to be predicted, y (m by 1), they find b, such that the estimate of y, ý = Xb, from b = X+y. The difference between the methods is that they all use different ways to estimate X+, the pseudoinverse of X.

In MLR, X+ = (XTX)-1XT. In PCR, where the data matrix is decomposed via Principal Components Analysis (PCA) as X = TkPkT + E, where Tk is the (m by k) matrix containing the scores on the first k PCs, Pk is the (n by k) matrix containing the first k PC loadings and E is a matrix of residuals, then X+ = Pk(TkTTk)-1TkT. The number of PCs, k, is determined via cross-validation or any number of other methods. In PLS, the decomposition of X is somewhat more complicated, and the resulting inverse is X+ = Wk(PkTWk)-1(TkTTk)-1TkT, where the additional parameter Wk (n by k) is known as the weights.

With that background covered, we can now consider “When is it necessary to move from traditional Multiple Linear Regression (MLR) to Partial Least Squares Regression (PLS)?

1) Any time the rank of the spectral data matrix X is less than the number of variables. The mathematical rank of a matrix is well defined as the number of linearly independent rows or columns. It is important because the MLR solution includes the term (XTX)-1 in the pseudoinverse. If X has rank less than the number of variables, XTX has rank less than its dimension, i.e. is rank deficient, and its inverse is undefined. PCR and PLS avoid this problem by decomposing X and forming a solution based on the large variance (stable) part of the decomposition. From this, it is clear that another answer to the question must be:

2) Any time the data contains fewer samples than variables. This is a common problem in spectroscopy because many instruments measure hundreds or thousands of variables (channels), but acquiring that many samples can be a very expensive proposition. The obvious follow on question is, “Then why not just reduce the number of variables.” The answer to that is, in short, noise reduction. Averaging of correlated measurements results in a reduction in noise.

But what about the case with m > n, but X is nearly rank deficient? That is, X is of full rank only because it is corrupted by noise? This leads to:

3) Any time the chemical rank of the spectral data matrix X is less than the number of variables. By chemical rank we mean the number of variations in the data that are due to chemical variation as opposed to detector noise and other minor effects.

So if any of the above three conditions exist, then it is appropriate to move from MLR to a factor-based method such as PLS or PCR. But I’m going to play devil’s advocate here a bit and give one more answer:

4) Always. MLR is, after all, just a special case of PLS and PCR. If you include all the components, you arrive at the same model that MLR gives you. But along the way you get additional diagnostic information from which you might learn something. The scores and loadings (and weights) all give you information about the chemistry.

On the NIR Discussion list, Donald J. Dahm wrote: “As a grumpy old man, I say the time to switch to PLS is when you are ready to admit that you don’t have the knowledge or patience to do actual spectroscopy.” I hope that was said with tongue firmly planted in cheek, because I’d argue that the opposite is true. When you are using PLS or PCR and interrogating the models, you are learning about the chemistry. When you use MLR, you are simply fitting the data.

BMW

“This software is awesome!”

Apr 7, 2011

That’s what Eigenvector’s Vice-President, Neal B. Gallagher, wrote to our developers, Jeremy, Scott and Donal, this morning about the latest version of PLS_Toolbox. So, OK, he may be a little biased. But it underscores a point, which is that all of us at EVRI who do consulting (including Neal, Jeremy, Bob, Randy, Willem and myself) use our software constantly to do the same type of tasks our users must accomplish. We write it for our own use as well as theirs. (Scott pointed out that this is known as “eating your own dog food.”)

Neal went on to say, “Every time I turn around, this package and set of solutions just keeps getting better and more extensive. I am truly convinced that this is the most useful package of its kind on the market. This is due to your intelligence, foresight, and attention to detail. It is also due to your open mindedness and willingness to listen to your users.”

Thanks, Neal. I think you made EVRIbody’s day!

BMW

EigenU 2011 Filling Up!

Mar 31, 2011

The Sixth Edition of Eigenvector University, EigenU 2011, is filling up. Many of our most popular classes are more than half full. This includes Linear Algebra for Chemometricians, MATLAB for Chemometricians, Chemometrics I: PCA, Chemometrics II: Regression and PLS, Variable Selection, and Common Chemometric Mistakes (and how to avoid them).

This year EigenU runs from Sunday, May 15 through Friday, May 20. As before, it will be held at the nation’s premier city athletic club, the Washington Athletic Club, in the heart of Seattle.

Early discount registration ends soon. To receive the early discount rates, registration must be received with payment by April 15.

See you in May!

BMW

Improved Model_Exporter Coming Soon

Mar 29, 2011

EVRI’s Model_Exporter allows users of PLS_Toolbox and Solo to export their Principal Component Analysis (PCA), Partial Least Squares (PLS) Regression and other models in a variety of formats suitable for use in both MATLAB and other environments (e.g. LabSpec, GNU Octave, Symbion, LabVIEW, Java, C++/C#). A new, improved version of Model_Exporter will be released in the coming weeks. The new version will allow models to be more easily applied on platforms with less computer power, such as on hand-held devices.

The new Model_Exporter will include a number of improvements including a much less memory-intensive encoding of Savitzky-Golay derivatives and smoothing and similar improvements in handling of excluded variables. The new version will also support additional preprocessing methods which were released in versions 6.0 and 6.2 of PLS_Toolbox and Solo.

The other big news is that we will also be bundling with Model_Exporter a freely-distributable .NET class to do predictions from exported models. As-is, this class can be integrated into any .NET application (C++/C#/VB) from Microsoft Visual Studio 2005 or later. It can be used and redistributed without any per-use charge (exactly as the exported models are licensed) and we’d be happy to work with you on integrating this class into your end application. If you have interest in this approach, just let our Chief of Technology Development, Jeremy Shaver, or me know.

BMW

PLS_Toolbox or Solo?

Mar 28, 2011

Customers who are new to chemometrics often ask us if they should purchase our MATLAB-based PLS_Toolbox or stand-alone Solo. I’ll get to some suggestions in a bit, but first it is useful to clarify the relationship between the two packages: Solo is the compiled version of the PLS_Toolbox tools which are accessible from the interfaces. Similarly, Solo+MIA is the compiled version of PLS_Toolbox and MIA_Toolbox. The packages share the same code-base, which is great for the overall stability of the software as they operate identically. They are also completely compatible, so PLS_Toolbox and Solo users can share data, models, preprocessing, and results. Models from each are equally easily applied on-line using Solo_Predictor.

The majority of the functionality of PLS_Toolbox is accessible in Solo, particularly the most commonly used tools. Every method that you access through the main Graphical User Interfaces (GUIs) of PLS_Toolbox is identical in Solo. But with Solo you don’t get the MATLAB command line, so there are some command-line only functions that are not accessible. The number of these, however, is decreasing all the time as we make more PLS_Toolbox functionality accessible through the GUIs. And of course all the GUIs in PLS_Toolbox + MIA_Toolbox are accessible in Solo+MIA.

So what to buy? Well, do you already have access to MATLAB? Over one million people do, and it is available in many universities and large corporations. If you have access to MATLAB, then buy PLS_Toolbox (and MIA_Toolbox). It costs less than Solo (and Solo+MIA), includes all the functionality of Solo and then some, can be accessed via command line and called in MATLAB scripts and functions.

And if you don’t already have MATLAB? If you only need to use the mainstream modeling and analysis functions, then Solo (and Solo+MIA) will save you some money over purchasing MATLAB and PLS_Toolbox (and MIA_Toolbox). I’d only purchase MATLAB if I needed to write custom scripts and functions that call PLS_Toolbox functions. Honestly, the vast majority of our users can do what they need to do using the GUIs. The stuff you can’t get to from the GUIs is pretty much just for power users.

That said, the big plus about buying MATLAB plus PLS_Toolbox is that you get MATLAB, which is a tremendously useful tool in its own right. Once you start using it, you’ll find lots of things to do with it besides just chemometrics.

Hope that helps!

BMW

New Versions Released

Mar 23, 2011

New versions of our most popular software products were released earlier this month. Our MATLAB-based, full-featured chemometrics package, PLS_Toolbox, is now in version 6.2. The stand-alone version of PLS_Toolbox, Solo, and Solo+MIA for multivariate and hyperspectral image analysis also move to version 6.2. MIA_Toolbox is now in version 2.5.

The functionality of Solo+MIA and MIA_Toolbox has been expanded greatly with the integration of ImageJ, the image processing application developed by NIH. New tools include particle counting with size and shape analysis, interactive data navigator for drilling into an image, magnify, cross-section and image alignment. A number of new data importers have also been added (RAW, BIF and ENVI).

For a tour of the new features, highlighting the new MIA tools, please watch this short video. For a complete list of new features, please see the PLS_Toolbox/Solo 6.2 release notes, and the MIA_Toolbox/Solo+MIA release notes.

Users with current maintenance contracts can download the new versions from their account. Free 30-day demos are also available for download.

We trust that the new tools will make you even more productive!

BMW

Happy Anniversary, Jeremy!

Feb 15, 2011

Time flies! Today marks the 10th anniversary of Jeremy Shaver joining Eigenvector Research. There are so many nice things I could say about Jeremy that I’m not sure where to start.

When Jeremy joined us in 2001, It didn’t take long for us to figure out that he was more than merely multi-talented. He rapidly took over as lead developer of our software products. As Chief of Technology Development, Jeremy, more than anyone else at EVRI, has been the driving force behind the updates (PLS_Toolbox versions 3.0 through 6.0) and many of the new products (MIA_Toolbox, Solo, Model_Exporter, etc.) we’ve released. There is a lot of his vision in the way our software works, looks and feels.

There have been lots of times when one of us said, “Wouldn’t it be great if our software did _____?” And then about 2 hours later Jeremy sends us some code to try out, and says, “You mean like this?” And we just laugh in amazement, “Yeah, like that!”

Jeremy also contributes greatly on consulting projects and is an enthusiastic short course instructor. He always leads our PowerUser Tips & Tricks session at EigenU because nobody know how to get more out of the software than he does.

Besides his chemometric skills, Jeremy is a pilot, a great guitarist (witness this jam session with Rasmus and Randy at EigenU 2010), avid outdoorsman, and devoted husband and dad. And we like to drink beer and hang out with him too!

Congratulations Jeremy. We hope you’ve enjoyed it as much as we have!

BMW

EigenU 2011 Registration Now Open

Jan 25, 2011

The Sixth edition of Eigenvector University will be held May 15-20, 2011. Once again, we’ll be at the fabulous Washington Athletic Club in the heart of downtown Seattle.

We’ve added two new classes this year, “SVMs, LWR and other Non-linear Methods for Calibration and Classification” and “Design of Experiments.” The non-linear methods class will focus on Support Vector Machines for regression and classification, which were added to PLS_Toolbox/Solo version 5.8 (February 2010) along with an interface to Locally Weighted Regression. We’ve found these methods to be quite useful in a number of situations, as have our users. The DOE course will focus on practical aspects experimental design, including designing data sets for multivariate calibration.

I sometimes say that the secret to Eigenvector Research is data preprocessing, i.e. what you do to the data before it hits a PLS or other multivariate model. Thus, “Advanced Preprocessing” has been expanded to a full day for EigenU 2011. We’ll cover many methods for eliminating extraneous variance, including the “decluttering” methods (Generalized Least Squares Weighting, External Parameter Orthogonalization, etc.) we’ve highlighted recently. “Multivariate Curve Resolution” has also be expanded to a full day in order to better cover the use of constraints and contrast control.

EigenU 2011 will also include three evening events, including Tuesday night’s “PLS_Toolbox/Solo User Poster Session,” (with iPod Nanos for the best two posters), Wednesday night’s “PLS_Toolbox/Solo PowerUser Tips & Tricks,” and the Thursday evening dinner event.

You can register for EigenU through your user account. For early discount registration, payment must be received by April 15. Questions? E-mail me.

See you at EigenU!

BMW

Sijmen de Jong

Nov 8, 2010

Sijmen de Jong passed away on October 30, 2010.

I first heard of Sijmen not long after finishing my dissertation. It was late 1992 or so, and I was working at Battelle, Pacific Northwest National Laboratory (PNNL). Sijmen had gotten a copy of an early version of PLS_Toolbox which was available on the internet via FTP. I received a letter from Sijmen with a number of “suggestions” as to how the toolbox might be improved. As I recall, the letter ran at least 3 pages, and included two floppy disks (5.25 inch!) filled with MATLAB .m files.

I was initially taken aback by the letter, and recall thinking, “Who is this guy?” But it didn’t take me long to figure out that, wow!, there was a lot of good stuff there. Several of the routines were incorporated into the next release of PLS_Toolbox, and modifications were made to several others. I’m sure there is still code of his in the toolbox today!

Sijmen was especially interested in the work I’d done with Larry Ricker on Continuum Regression (CR) [1], as he had just published on another continuously adjustable technique, Principal Covariates Regression (PCovR) [2]. He was also working on his SIMPLS [3] algorithm for PLS around that time. Sijmen suggested that the SIMPLS algorithm might be extended to CR. This started a collaboration which would eventually produce the paper “Canonical partial least squares and continuum power regression” in 2001 [4]. I still think that this is the best paper I’ve ever had the pleasure of being associated with. It won the Unilever R&D Vlaardingen “Author Award,” and I still display the certificate proudly on my office wall.

While CR is primarily of academic interest, SIMPLS has become perhaps the most widely used PLS regression algorithm. The reasons for this are evident from several of my recent posts concerning accuracy and speed of various algorithms. If you have done a PLS regression in PLS_Toolbox or Solo, you have benefited from Sijmen’s work!

Sijmen was the kind of smart that I always wanted to be. He seemed to see clearly through the complex math, understand how methods are related, and see how a small “trick” might greatly simplify a problem. As another colleague put it, “A very clever guy!” Beyond that, he was easy to work with and fun to be around. I regret that I got to be around him socially only a few times.

Rest in peace, Sijmen. You will be missed!

BMW

[1] B.M. Wise and N.L. Ricker, “Identification of Finite Impulse Response Models with Continuum Regression,” J. Chemometrics, 7(1), pps. 1-14, 1993.

[2] S. de Jong, H.A.L. Kiers, “Principal Covariates Regression: Part 1, Theory,” Chemo. and Intell. Lab. Sys., Vol 14, pps. 155-164, 1992.

[3] S. de Jong, “SIMPLS: an alternative approach to partial least squares regression,” Chemo. and Intell. Lab. Sys., Vol. 18, pps. 251-263, 1993.

[4] S. de Jong, B.M. Wise and N.L. Ricker, “Canonical partial least squares and continuum power regression,” J. Chemometrics, 15(1), pps. 85-100, 2001.

Speed of PLS Algorithms

Nov 1, 2010

Previously I wrote about accuracy of PLS algorithms, and compared SIMPLS, NIPALS, BIDIAG2 and the new DSPLS. I now turn to speed of the algorithms. In the paragraphs that follow I’ll compare SIMPLS, NIPALS and DSPLS as implemented in PLS_Toolbox 6.0. It should be noted that the code I’ll test is our standard code (PLS_Toolbox functions simpls.m, dspls.m and nippls.m). These are not stripped down algorithms. They include all the error trapping (dimension checks, ranks checks, etc.) required to use these algorithms with real data. I didn’t include BIDIAG2 here because we don’t support it, and as such, I don’t have production code for it, just the research code (provided by Infometrix) I used to investigate BIDIAG2 accuracy. The SIMPLS and DSPLS code used here includes the re-orthoginalization step investigated previously.

The tests were performed on my (4 year old!) MacBook Pro laptop, with a 2.16GHz Intel Core Duo, and 2GB RAM running MATLAB 2009b (32 bit). The first figure, below, shows straight computation time as a function of number of samples for SIMPLS to calculate a 20 Latent Variable (LV) model, for data with 10 to 1,000,000 variables (legend gives number of variables for each line). The maximum size of X for each run was 30 million elements, so the lines all terminate at this point. Times range from a minimum of ~0.003 seconds to just over 10 seconds.

It is interesting to note that, for the largest X matrices the times vary from 4 to 12 seconds, with the faster times being for the more square X‘s. It is fairly impressive, (at least to me), that problems of this size are feasible on an outdated laptop! A 10-way split cross-validation could be done in less than a minute for most of the large cases.

The second figure shows the ratio of the computation time of NIPALS to SIMPLS as a function of number of samples. Each line is a fixed number of variables (indicated by the legend). Maximum size of X here is 9 million elements (I just didn’t want to wait for NIPALS on the big cases). Note that SIMPLS is always faster than NIPALS. The difference is relatively small (around a factor of 2) for the tall skinny cases (many samples, few variables) but considerable for the short fat cases (few samples, many variables). For the case of 100 samples and 100,000 variables, SIMPLS is faster than NIPALS by more than a factor of 10.

The ratio of computation time for DSPLS to SIMPLS is shown in the third figure. These two methods are quite comparable, with the difference always less than a factor of 2. Thus I’ve chosen to display the results as a map. Note that each method has its sweet spot. SIMPLS is faster (red squares) for the many variable problems while DSPLS is faster (light blue squares) for the many sample problems. (Dark blue area represents models not computed for X larger than 30 million elements). Overall, SIMPLS retains a slight time advantage over DSPLS, but for the most part they are equivalent.

Given the results of these tests, and the previous work on accuracy, it is easy to see why SIMPLS remains our default algorithm, and why we found it useful to include DSPLS in our latest releases.

BMW

Advanced Features in Leuven, Belgium

Oct 4, 2010

Following a successful premier of our new course, Using the Advanced Features of PLS_Toolbox/Solo 6.0, we’re pleased to announce that we will be repeating the course in Leuven, Belgium on November 22, 2010. The course is being arranged in conjunction with CQ Consultancy.

And, yes! It was a successful premier! We had 28 students at the initial offering on the University of Barcelona campus. Many, many, thanks to Romà Tauler and Anna de Juan for hosting the event, and of course many thanks to the attendees for coming. We used a beta of Solo 6.0 for the course, and I’m happy to say it ran quite smoothly. It should be ready for release by mid-October.

BMW

Mediterranean Tour

Sep 17, 2010

I head off tomorrow morning (way too early!) on a “tour” of the Mediterranean. I’ll be attending The First African-European Conference on Chemometrics, aka Afrodata, in Rabat, Morocco. From there, I’ll go to CMA4CH–Application of MVA and Chemometrics to Cultural Heritage and Environment in Sicily. Then it’s on to Barcelona to teach Using the Advanced Features of PLS_Toolbox.

I’m looking forward to getting out of the office and spending some time with my chemometric colleagues. When I see something of chemometric interest during my travels, I’ll try to get it posted here. I hope to see many of you in the coming weeks!

BMW

One Last Time on Accuracy of PLS Algorithms

Sep 9, 2010

Scott Ramos of Infometrix wrote me last week and noted that he had followed the discussion on the accuracy of PLS algorithms. Given that they had done considerable work comparing their BIDIAG2 implementation with other PLS algorithms, he was “surprised” with my original post on the topic. He was kind enough to send me a MATLAB implementation of the PLS algorithms included in their product Pirouette for inclusion in my comparison.

Below you’ll find a figure that compares regression vectors from NIPALS, SIMPLS, DSPLS and the BIDIAG2 code provided by Scott. The figure was generated for a tall-skinny X-block (our “melter” data) and for a short-fat X-block (NIR of pseudo-gasoline mixture). Note that I used our SIMPLS and DSPLS that include re-orthogonalization, as that is now our default. While I haven’t totally grokked Scott’s code, I do see where it includes an explicit re-orthogonalization of the scores as well.

Note that all the algorithms are quite similar, with the biggest differences being less than one part in 1010. The BIDIAG2 code provided by Scott (hot pink with stars) is the closest to NIPALS for the tall skinny case, while being just a little more different than the other algorithms for the short fat case.

This has been an interesting exercise. I’m always surprised when I find there is still much to learn about something I’ve already been thinking about for 20+ years! It is certainly a good lesson in watching out for numerical accuracy issues, and in how accuracy can be improved with some rather simple modifications.

BMW