Category Archives: Chemometrics

Chemometics news and issues.

Eigenvector Awarded SBIR Contract

Jul 8, 2009

We got some good news last week when we found out we won an SBIR (Small Business Innovation Research) award to work with NIST (National Institute of Standards and Technology). The project will focus on continued advancement of the Temperature-Programmed Sensing (TPS) system developed at NIST by Steve Semancik and co-workers.

The TPS system is a multi-sensor array, typically with different materials on the individual sensing elements. But there is a twist: the TPS system allows for temperature control of the individual chemical sensor elements. TPS sensors are also known as “micro-hotplate sensors.” The ability to do temperature programming, i.e. vary the sensing element temperature as a function of time in a prescribed way, provides many opportunities for optimization. In addition, proper utilization of the sensor’s time/temperature response may lead to the realization of the “second order advantage” typically found in much larger and more complex analytical systems such as GS-MS.

We’re excited about working on this project and trust that our collective experience (100+ man-years of chemometrics consulting!) will lead to some improvements in the system performance.

I’ve included below the project title and abstract.

Chemometric Support for Temperature-Programmed Sensing System

The Temperature-Programmed Sensing (TPS) system developed at NIST presents many opportunities and unique challenges. The data output from the system can be quite complex and there are many opportunities to optimize the system for specific sensing scenarios. We propose a program aimed at characterizing the system so that potential problems (such as system drift) can be solved early so that the full potential of the system can be realized. The plan includes studies on the stability and theoretical functionality of the sensors. This will result in procedures for instrument standardization and data base-lining. After this is accomplished, advanced preprocessing methods will be considered, along with the use of multi-way (“second order”) data modeling methods for use in calibration and classification. Finally, procedures for optimizing the system for specific applications will be developed.

BMW

Referencing Software – Summary

Jun 27, 2009

The post I wrote on June 11, Referencing Software, resulted in a rather lengthy thread on ICS-L, the Chemometrics discussion list. Most discussants generally agreed that the software used to develop results for scientific papers should be referenced, including the software title, publisher, year and version number.

There were a few dissenters. Sergey Kucheryavski wrote:

…it makes sense only in two cases.
1. The software implements some unique, patented methods that are not available in any other programs.
2. There is a special agreement (caused by special price [e.g. PLS toolbox for 100 USD] or some other benefits) between user and a software company.

In a similar vein, Richard Brereton wrote:

If however software has been bought (often at considerable cost) I do not think this obliges the originator to cite it, there is no ethical requirement, unless the software was purchased at a specially low price…

I find this to be a rather interesting concept and wonder what would happen if we applied it to books as well. If I paid the $1595 for Comprehensive Chemometrics by Brown, Tauler and Walczak, do I not have to reference it? How about Brereton’s “Applied Chemometrics for Scientists?” Do I have to reference it because it’s only $110? Or do I have to reference it only if I get a discount off the list price? Clearly this is ridiculous.

Most of the respondents felt that it was important to reference software in order to assure reproducibility of the work. Philip Hopke wrote:

It seems to me that the fundamental issue is the ability of the reader of a paper to be able to fully replicate the work that has been published. If work is not replicable, then it is not science. The issue then is what is required to ensure repeatability of the … work being reported.

And I agree that this is the most important issue, but it is not the only issue. Bruce Kowalski wrote:

If ya’ll want a real debate start taking issue with publications that don’t mention the first paper in a research area. There has always been a correct way to use references. What went wrong???????

Referencing the first paper in a research area is all about giving credit where it’s due. It’s not about reproducibility, (which, as Cliff Spiegelman observed, is often better served by referencing later work, often done by someone other than the original author). Likewise, referencing software is also partly about giving credit where it’s due–recognizing the effort expended to produce a useful tool of scientific value.

BMW

Referencing Software

Jun 11, 2009

Yesterday I picked up a newly-arrived journal and noted that the article highlighted on the front page looked quite interesting as we have been doing some related work. I eagerly turned to the article and found that the author had been using PLS-DA (Partial Least Squares Discriminant Analysis) and was, in fact, one of our PLS_Toolbox users. Imagine my disappointment when I could find no reference to our software in the article! The article had many references, with more than 50 journal articles cited and at least a half dozen equipment manufacturers named. But there was no mention of the software that was used to turn the measurements into the results that were presented.

I checked with the author of the article, and yes, our software was used to produce the results that were re-plotted in another application prior to publication. But exactly whose software was used is beside the point. The point is that software is a critical part of the experiment and, in order to ensure reproducibility, should be referenced.

Some might say that referencing the original work on the development of the particular analysis method implemented in the software should suffice, (though in this instance that wasn’t done either, the author referenced previous work of their own where they used PLS-DA). I’d argue that isn’t enough. The problem is that it typically takes a LOT of additional work to turn a method from an academic paper into a working program. There are often many special (sometimes schizophrenic) cases that must be handled properly to assure consistent results. Sometimes various meta-parameters must be optimized. Preprocessing can be critical. And then there is the whole interface which allows the user to interact with the data so that it can be effectively viewed, sorted and selected.

So why do I care? Obviously, there is the commercial aspect: having our software referenced is good advertising for us, and leads to more sales. But beyond that, (like many other publishers of scientific software, I’m sure), our software is probably our most significant work of scholarship. To not reference it is to not acknowledge the contributions we’ve made to the field.

So I’m calling on all authors to reference the software they use, and editors and reviewers to check that they do. Listing it in the references would be preferred. Software is, after all, a publication, with a title, version number (edition), publisher, and year of publication. Often, authors are also known, and can be listed. But one way or the other, software should be cited as it is critical to reproducibility and represents scholarly work upon which results depend. Referencing software upholds scientific and scholarly tradition and it is academically dishonest to not do so.

BMW

Use of Unlabeled Data in Regression Modeling

Jun 7, 2009

In 1995 Edward V. Thomas published “Incorporating Auxiliary Predictor Variation in Principal Components Regression” in J. Chemometrics, (vol. 9, no. 6, pps 471-481). Thomas demonstrated how additional samples, without matching references values, can be used when building a PCR model. These samples, commonly called “unlabeled data,” help stabilize the estimates of the principal components. So their use often results in slightly better models than using only the labeled data.

While at EPFL last fall, I was working with Paman Gujral, Michael Amrhein and Dominique Bonvin considering methods for updating regression models. Often, one of the problems with updating models is lack of reference values, i.e. all the new data is unlabeled. Thus, it seemed natural to see how “Edward’s PCR” worked in this situation.

The result is the study “On the bias-variance trade-off in principal component regression with unlabeled data,” which will be presented this week as a poster at SSC-11. The study shows that “Edward’s PCR” works great if the new data is still in the same subspace as the old data. This might occur, for instance, in spectroscopic applications where the same set of analytes still exists but their range has been expanded. But when new analytes are thrown into the mix, thus expanding the subspace, this method leads to even larger prediction biases than not updating models at all. This is because new unlabeled samples rotate the PCs, and ultimately the regression vector, more towards the new analytes while not having any reference values to tell the model to ignore this subspace.

Hope you enjoy the poster, and I hope everybody has a good week at SSC-11!

BMW

EAS Chemometrics Award Session for Romà Tauler

Jun 4, 2009

As announced in a previous post, Romà Tauler was selected as this year’s recipient of the EAS Award for Achievements in Chemometrics. Romà asked me to organize the award session, and I have happily obliged. The session will be on Tuesday afternoon, November 17. The theme of the session is “Uncertainties, Ambiguities, and Chemometrics.” Talks will include:

• Peter Wentzell, Dalhousie University, “Exploratory Data Analysis with Noisy Data”
• Anna de Juan, Universitat de Barcelona, “Using Noise Structure Knowledge in MCR Process Analysis”
• Willem Windig, Eigenvector Research, Inc., “How Being Negative Can Be Good”
• Age Smilde, University of Amsterdam, “Modeling Dynamic Metabolomics Data Using Prior Knowledge”
• Romà Tauler, Spanish Council of Scientific Research, “Ambiguities and Error Propagation Effects on Multivariate Curve Resolution Solutions”

We’re looking forward to a great session. See you in the fall!

BMW

EigenU Redux: Chemometrics in Wenatchee, July 13-16

Jun 4, 2009

We had a number of people that just couldn’t make it to EigenU last month and wanted a Chemometrics course this summer. So we’re planning on doing a course here in Wenatchee, WA, July 13-16, 2009. We’re doing our “Basic Chemometrics” course on Monday-Wednesday, including:

Linear Algebra for Chemometricians
MATLAB for Chemometricians
Chemometrics I: Principal Components Analysis
Chemometrics II: Partial Least Squares and Regression

On the optional fourth day, Thursday, we’ll go over some special topics, including:

Instrument Standardization and Calibration Transfer
Variable Selection
Advanced Preprocessing

The cost of the course will be $1475/$650 (industrial/academic) for the first three days, and $475/$225 for the optional fourth day.

Please come with a laptop with either MATLAB + PLS_Toolbox or Solo installed. (The free demo versions are just fine for this.) Let me know if this is a problem and I’ll try to help you out.

For further information and to register, just contact me.

BMW

Chemometrics in Cultural Heritage

May 29, 2009

Last fall I had the pleasure, with Rasmus Bro, to teach a chemometrics course in Rome. The choice of this location was a result of Rasmus just wanting to go to Rome, and me making an email acquaintance of Prof. Giovanni Visco of the University of Rome, (La Sapienza). In 2008 Giovanni was organizing the second CMA4CH meeting, which is a rather un-obvious acronym for “Application of Multivariate Analysis and Chemometry to Cultural Heritage and Environment.” We gave Giovanni a copy of Solo for the Best Presentation Prize at CMA4CH 2008, and a friendship was born. So when we (Rasmus, along with my wife and daughters, had easily convinced me) wanted to do a course in Rome, I contacted Giovanni and he figured it all out for us. Between Giovanni and his colleague Federico Marini, we were very well taken care of during our stay in Rome!

Giovanni is now in the process of organizing CMA4CH 2010, which will be held on the island of Sicily on September 26-29. He was kind enough to ask me to be the Co-chair for Chemometrics, and I gladly agreed. While somewhat specific, this meeting considers in depth a rather important intersection between a scientific method and an application.

Of course, Italy is THE place for a meeting focused on cultural heritage; they have more of it than just about anybody. And there are so many potential applications of chemometric methods in this arena, (identification of artifacts, provenance of origin, fraud detection, effect of climate and pollution, restoration, etc.), that there should be plenty to discuss! We’re looking forward to it.

BMW

Chemometrics and Fortune 500 Companies

May 28, 2009

The other day I was updating my bio for a conference and was working on some sentences regarding our experience teaching chemometrics. It included a reference to teaching employees of Fortune 500 companies. So I decided to try to figure out how many of these companies had sent employees to our courses.

Scanning the list, (and just from memory), I came up with 50+ companies we have either taught in-hours courses for or who have sent people to our open courses, including: 3M Company, Abbott Laboratories, Advanced Micro Devices, Agilent Technologies, Air Products and Chemicals, Alcoa, Amgen, Applied Materials, AT&T, Avery Dennison, Becton, Dickinson & Co., Boston Scientific, Boeing, Bristol-Myers Squibb, Chevron, Colgate-Palmolive, ConocoPhillips, Corning, Delphi, Dow Chemical, E. I. du Pont de Nemours & Co., Eastman Chemical, Eastman Kodak, Eli Lilly & Co., ExxonMobil, Ford, General Electric, General Motors, Goodrich, Goodyear Tire & Rubber, Hershey, Hewlett-Packard, Honeywell, Huntsman, Intel, IBM, International Paper, Johnson & Johnson, Kraft, Lockheed Martin, Lucent, Merck & Co., Micron, Owens Corning, Pfizer, Praxair, Procter & Gamble, Rohm & Haas, Schering-Plough, Sunoco, Texas Instruments, Weyerhaeuser and Wyeth.

First off, I’m pleased that personnel at all of these companies have thought enough of us to come to our courses, (such as EigenU). But beyond that, it shows that chemometrics is an important and widely applicable discipline. In these companies chemometric methods play a critical role in many aspects of their product life cycles, from basic research, through product development and scale-up to manufacturing. Multivariate methods improve efficiency and, therefore, are part of these companies competitive advantage.

BMW

Another EigenU Complete!

May 26, 2009

The fourth edition of Eigenvector University came to a close last Friday, May 22. The week-long EigenU 2009 had 25 participants from a wide variety of industries and universities. This was somewhat smaller than 2008, but still a good showing given the current state of the economy. Of course, we think it’s the really smart companies that use the slow times to improve the skill sets of their employees!

As usual, the kind folks at the WAC took good care of us. Many thanks to Rick, Amanda, Wayne, Joe, Joshua, Timothy, Bernie, Randall, Eddie, Quentin and all the rest that kept us hydrated and well-fed.

Speaking of well-fed, we all enjoyed Thursday evening’s workshop dinner at Torchy’s. Here everyone contemplates the dinner choices while discussing the day’s courses, which included Chuck and Bob’s “Implementing Chemometrics in PAT,” Rasmus’ “Variable Selection,” and Willem’s “Chemometrics in Mass Spectrometry.”

$EigenU 2009 Dinner at Torchy\'s$

Thanks to all the course participants for braving the swine flu panic to join us! Also, to the 8 “Eigenvectorian” instructors (Scott, Jeremy, Willem, Neal, Chuck, Bob, Rasmus and myself) for developing and leading the courses.

EigenU 2010 is tentatively scheduled for May 16-21, 2010. In the mean time, catch one of our courses at SIMS XVII, FACSS or EAS, or contact me to schedule in-house training!

BMW

Congratulations Romà!

Apr 17, 2009

This year’s Eastern Analytical Symposium Award for Achievements in Chemometrics goes to Romà Tauler. Romà is a Research Professor at CSIC, the Institute of Chemical and Environmental Research in Barcelona, Spain.

Romà continues to be a pioneer in Multivariate Curve Resolution, the collection of techniques used for decomposing spectral data into its physically meaningful underlying components. Romà has published an astounding number of papers concerning both the theoretical and practical aspects of MCR, in addition to many other papers in the general field of chemometrics.

Romà is also Editor in Chief of Chemometrics and Intelligent Laboratory Systems.

Professor Tauler joins the previous EAS Chemometrics Award winners (a distinguished group if I may say so myself) listed below:

A special session honoring Romà’s achievements will be presented at this year’s EAS in November.

Eigenvector has been the sponsor of the Chemometrics Award since 2002, and we’re pleased to do it again this year. Congratulations Romà!

BMW

EigenU Registration Open

Apr 9, 2009

The fourth edition of Eigenvector University will be held in Seattle May 17-22. We’re excited about this edition in part because we have four new courses: Robust Methods, Correlation Spectroscopy, Common Mistakes in Chemometrics (and how not to make them), and Implementing Chemometrics in PAT.

But we’re also excited because EigenU is about the only time during the year where we get the whole Eigenvector staff together. This is a benefit for us–we like to see each other and exchange ideas on consulting projects, talk about software development, etc. But it’s also a benefit for the attendees–a chance to talk to all the Eigenvectorians and find whichever one of us has the most experience on your problem.

While we’ve been somewhat worried that the current economy would affect our attendance, I’m pleased to report that registrations are coming in and we’re up to 17 participants as of April 9. Apparently, there are some companies out there that realize that the best time to sharpen the saw is before all the orders for logs come in.

Early registration for EigenU ends on April 17. After that, prices go up, so get your training plan in order now!

BMW

EigenGuys at FACSS in Reno

Oct 20, 2008

This was the first year in a long time that I didn’t make it to FACSS, but that doesn’t mean that Eigenvector wasn’t there. The EigenGuys attending included Neal Gallagher, Jeremy Shaver, Chuck Miller and Scott Koch.

As usual, EVRI taught some courses: Neal took the lead on our popular Chemometrics without Equations, and introduced a new course, Advanced Chemometrics without Equations. As its name implies, ACWE explains concepts such as advanced preprocessing and variable selection in words and pictures rather than equations.

The EigenGuys also gave a number of talks. Jeremy presented “Making Do-Weighted regression models for use with less-than-perfect data.” This work describes a strategy for developing models based on historical data when the most interesting or critical data is underrepresented.

Chuck presented our still-not-quite-complete study of preprocessing and calibration transfer methods, “Combining Calibration Transfer and Preprocessing: What methods, What Order?” The good news is, as far as the examples we have go, it doesn’t matter if you preprocess then do calibration transfer or the other way around. (If you have data where you think it makes a real difference, please drop a line.) Chuck’s other offering, “Analytical Chemistry and Multi-Block Modeling for Improved NIR Spectral Interpretation,” demonstrated how PLS2 can be used to analyze data from multiple analytical instruments in order improve understanding. This deeper knowledge can be used in turn to improve model performance.

Scott headed up the trade show aspect of the conference, manning our booth. Scott’s main task was doing demos of our new PLS_Toolbox 5.0, which was just released last week. Look for Solo 5.0 shortly!

BMW

Properties of PLS

Sep 28, 2008

As part of my sabbatical here at the Automatic Control Laboratory at EPFL, I was asked to give a seminar. I wanted to talk about some of the work I’d done lately concerning properties of PLS, and differences between PLS algorithms, pretty much the same material I’d presented as a poster at CAC-2008 in Montpellier.

I was asked to make the presentation a little more tutorial in nature, so I included more background on multivariate calibration. The result is “Properties of Partial Least Squares Regression and Differences between Algorithms,” which I presented Friday, September 26, 2008. Enjoy!

BMW

Chemometrics Short Course in Rome, October 27-29, 2008

Jul 16, 2008

Some time ago I asked Rasmus Bro if he would be interested in teaching a short course with me in Europe this fall. He said, “Yes, and I really want to go to Rome!” Fortunately, I’d been in contact lately with Dr. Giovanni Visco of Rome University Chemistry Department regarding the CMA4CH meeting.

Dr. Visco has been kind enough to put us in touch with CASPUR, the nearby “Interuniversity Consortium for Supercomputing and Research,” which has good facilities for teaching a computer based course. We’re currently planning on teaching an introductory 3-day course October 27-29, 2008. The course will include:

Obviously, we’ll have to do a little editing of these courses to fit this 4.5 days worth of material into 3 days! If you have questions, please drop me a line (bmw@eigenvector.com).

See you in Rome this fall!

BMW

IUPAC Glossary of Chemometric Terms and Concepts

Jul 16, 2008

Nomenclature has been a subject of some discussion within the chemometrics community, such as on the list ICS-L. I recall exchanges dealing with the definition of various terms such as “factor,” “latent variable,” and “principal component.” Its clear that we don’t all use these terms in exactly the same way. For the most part, this doesn’t bother me. Authors should be free to use terms as they wish provided that they define them unambiguously in their text.

However, it would be useful for the community to have a set of generally agreed upon definitions for commonly used terms and concepts. Enter IUPAC, the International Union of Pure and Applied Chemistry. I remember first hearing of IUPAC when I was an undergrad learning organic chemistry. Learning the IUPAC names for compounds was always straightforward as they were very systematic. This was in contrast to learning common names, which, it seemed at times, were pretty much random.

Professor D. Brynn Hibbert, of the University of New South Wales, has received funding for a small IUPAC project to develop a glossary of concepts and terms in chemometrics. He presented a brief introduction to this project at CAC-2008. His collaborators on this include Professor Pentti Minkkinen, Lappeenranta University of Technology, Dr. Klaas Faber, Chemometry Consultancy, and myself.

The initial project goal is to establish the scope of the problem, and to develop a draft glossary and a consultation process. To do this we plan to set up a “wiki” where members of the community could edit terms or add new ones. We’ve had several offers of existing glossaries which could be used to populate the wiki initially. We’ll do that and then let everybody have at it. The wiki software will keep track of all the edits submitted, so we’ll know what terms are particularly contentious. Once it has settled down, the project team will create a consensus list for eventual presentation to IUPAC.

An IUPAC glossary would make it easier for authors as they could simply state that they will adhere to IUPAC definitions, and thus not have to define terms further. But perhaps more importantly, it would make things easier for students of chemometrics, who could learn a common set of terms and then only have to worry about the exceptions as they come up. Ultimately, it should be good for the field of chemometrics.

It’s Eigenvector’s job to get the wiki set up. I’ll let you know when it becomes available.

BMW

CAC-2008 Poster Prize Winners

Jul 4, 2008

Eigenvector was pleased to sponsor the “Best Poster” prize at CAC-2008. The top three poster presenters all received a certificate good for a copy of PLS_Toolbox or Solo (well, OK, it wasn’t exactly a certificate, it was one of my business cards with “Good for one PLS_Toolbox” written on the back!). The top poster also got $500USD, which equates to 320€.

There were 160 posters presented at CAC, so this was quite a contest! The winners, selected by the CAC scientific committee, represent some exceptional efforts selected from a very large body of good work.

The third place poster was “Drift compensation of gas sensor array data by Orthogonal Signal Correction” by M. Padilla, A. Perera, I. Montoliu, A. Chaudry, K. Persaud and S. Marco. This is a nice application of OSC. We’ve used it for spectroscopic instrument standardization and found it to work well in that application. It makes sense that it would work well for electronic noses as well.

Second place went to Pat Wiegand, Randy Pell and Enric Comas, all of Dow, for “Simultaneous Variable and Sample Selection for PLS Calibrations Using a Robust Genetic Algorithm.” This work addressed the problem where one has both samples and variables that are irrelevant for building a predictive model for a given property. Most previous work address either the variable selection or the sample selection problem, but not both. The robustness of their algorithm comes, in part, from a robust PLS algorithm from the LIBRA Toolbox, developed by Sabine Verboven and Mia Hubert. This toolbox is what provides the robust options for PCA and PLS in PLS_Toolbox, so of course we think that was a very good choice!

Emma Peré-Trepat accepted the first place prize on behalf of herself and co-workers I. Montoliu, F.P. Martin, S. Rezzi and S. Kochhar, all of Nestlé Research Center. They presented “Data fusion strategies for nutrimetabonomics.” Nutrimetabonomics, the application of metabonomics to nutritional sciences, is the study of metabolic responses to the consumption of specific foods and ingredients. Their approach used hierarchical modeling to fuse NMR and meta-data.

Congratulations again to the winners!

BMW

More from CAC-2008

Jul 4, 2008

Its been a long week, absolutely packed. I haven’t gotten to every session, but I thought I’d include a few notes about several more talks I really enjoyed.

Selena Richards presented “Self-Modeling Curve Resolution: a new approach to recovering temporal metabolite signal modulation in NMR spectroscopic data: Application to a life-long caloric restriction in dogs.” Its been known for some time that restricting caloric intake lengthens the life span of most mammals. This talk is concerned with finding the metabolomic signature of this effect. Besides the novel use of MCR, I enjoyed the talk because the subjects were Labrador Retrievers. We’ve been trying to keep our yellow lab, Jenny, thin, also because she has some joint problems that would be exacerbated if she was over weight. But man, labs will eat anything, so keeping them out of the food can be a challenge! I’m not sure how calorie restriction works in humans, but I’m sure life seems longer!

Steven Short talked on “Determination of Figures of Merit for Near-Infrared and Raman Spectrophotometers by Net Analyte Signal Analysis for a Four Compound Solid Dosage System.” This work discussed how NAS can be used to compare analytical instruments. I took a look at NAS some years ago after Avi Lorber published “Net analyte signal calculation in multivariate calibration.” My main disappointment with NAS, when calculated based on a regression model, is that its a function of the number of factors in the model, and it isn’t particularly useful for picking number of factors. Short gave a nice application of where NAS can be truly useful.

“Resolution of hyperspectral images. Pre-, in- and post-processing” was presented by Anna de Juan. The talk was something of a overview of past work, but really summarized very well many of the possibilities of using MCR in images. Much of this talk is included in her article (with Maeder, Hancewicz and Tauler) “Use of local rank-based spatial information for resolution of spectroscopic images,” J. Chemo, 22, pps 291-298, 2008. I think the work is a good guide for users of PLS/MIA_Toolbox in that it shows a lot of what you can do with the tools.

All in all it was a very good conference. The only down side was that it was sometimes a victim of its own success–there were simply too many talks, posters and people I wanted to talk with to get to them all!

BMW

Update from CAC-2008

Jul 3, 2008

Greetings from Montpellier, where Jeremy and I are attending CAC-2008. We’re now into our third day of the conference, and it has gotten off to a good start. I thought I’d just take a minute and highlight several talks that I really enjoyed.

Brynn Hibbert presented “Analysis of variance of complex data sets using GEMANOVA: An example using kill kinetics data.” GEMANOVA is essentially a variant of PARAFAC, used like ANOVA to determine what effects are significant, but in multi-way data. The talk made me want to make sure that we can get PARAFAC working in this way for our users. The trick is in setting the constraint options, and in automating the building of sequences of models with different constraints. In any case, this talk demonstrates that PARAFAC, in the right hands, is a very powerful and versatile technique.

“New proposals for PCA model building with missing data” was delivered by Alberto Ferrer. As usual, Alberto gave a very clear presentation–a nice talk to listen to. Alberto showed how methods for imputing missing data in PCA models when a model exists can also be used to develop new PCA models in the face of missing data. PLS_Toolbox, incidentally, uses one of these methods. It was also shown that the NIPALS method for building models with missing data does not work well in comparison to the other methods.

I also really enjoyed Henri Tapp’s talk, “OPLS: an ideal tool for interpreting PLS regression models?” Henri discussed, why, in his opinion, there really isn’t much advantage to OPLS, even in interpretability. (It is admitted by its creator, Johan Trygg, that it does not improve predictive ability over conventional PLS.) Another interesting point in Tapp’s talk was the bibliographic survey of papers citing the original OPLS paper, which showed that OPLS is mostly referenced by Umeå/Umetrics authors and Imperial University. I wonder, how much do you suppose the patent on OPLS has to do with this rather in-bred distribution?

My own talk, “Tools for Multivariate Calibration Robustness Testing with Observations on Effects of Data Preprocessing” was reasonably well-received (at least I wasn’t booed off the stage) and sparked some discussion. I’ve learned over the years that a relatively simple talk with some nice graphics is a good thing to present in the right after lunch spot, when conferees are suffering from PLS (post-lunch syndrome). And of course always energetic & enthusiastic Jeremy did a great job with “Automatic Sample Weighting for Inferential Modeling of Historical In-Control Process Data.”

So far, so good. More later!

BMW

NIPALS versus Lanczos Bidiagonalization

Jun 24, 2008

In 2007, Randy Pell, Scott Ramos and Rolf Manne (PRM) ignited a controversy when they published “The model space in PLS regression.” Their paper pointed out that the X-block residuals in different PLS packages were not the same. Specifically, packages which use the NIPALS or SIMPLS method for PLS (including PLS_Toolbox/Solo, Unscrambler and SIMCA-P) produce different residuals than those that use Lanczos Bidiagonalization (primarily Pirouette). PRM claimed that that residuals in NIPALS were “inconsistent” and made the rather inflammatory statement that NIPALS “amounted to giving up mathematics.”

As you might imagine, this has resulted in a considerable amount of activity in the chemometrics community. And it really has been useful because many of us, including myself, have learned quite a bit about PLS, a subject we thought we already understood pretty well.

There will be a crop of articles in the upcoming issue of Journal of Chemometrics on this subject. This will include a letter to the editor by Svante Wold et. al., “The PLS model space revisited,” which takes a theoretical/philosophical look at how PLS via NIPALS is derived and shows that, in this light, it is not inconsistent. Rasmus Bro and Lars Eldén’s contribution, “PLS Works,” shows that while the PLS NIPALS residual space is orthogonal to the model scores, and thus the fitted y-values, this is not true of Bidiag. I understand that there will also be a paper in the upcoming issue from Rolf Ergon, though I don’t know the title yet.

The work of Bro and Eldén served as a launching point for an investigation of my own regarding how and why Bidiag residuals are correlated with scores. The result is a poster which I will show at CAC-2008 next week, “Properties of PLS, and Differences between NIPALS and Lanczos Bidiagonalization.” The poster shows why and when NIPALS and Bidiag residuals are different, and shows some examples of when Bidiag residuals are strongly correlated with the scores. This includes the main example given in PRM, where, as it turns out, the main difference in the residuals is due to the 3rd factor in the Bidiag model being quite correlated with the residuals.

If you are attending CAC, please drop by and talk to me during the poster presentation. I’m sure we’ll have a lively discussion!

BMW

References:
R. J. Pell, L. S. Ramos and R. Manne, “The model space in PLS regression,” J.Chemometrics, Vol. 21, pps 165-172, 2007.
R. Bro and L. Eldén, “PLS Works,” J. Chemometrics, in press, 2008.
S. Wold, M. Høy, H. Martens, J. Trygg, F. Westad, J. MacGregor and B.M. Wise, “The PLS model space revisited, J. Chemometrics, in press, 2008.
B.M. Wise, “Properties of PLS, and Differences between NIPALS and Lanczos Bidiagonalization,” CAC-2008, Montpellier, France, 2008.

CAC-2008 in Montpellier, France

Jun 23, 2008

The Eleventh Conference on Chemometrics in Analytical Chemistry, CAC-2008, begins next week in Montpellier, France. The conference runs from June 30 through July 4.

All indications are that it will be a great conference. The organizers say that attendance is will be close to 350, which must be a record for CAC.

Eigenvector will be there, of course. Our Jeremy Shaver will present Automatic Sample Weighting for Inferential Modeling of Historical In-Control Process Data, which is concerned with the problem of developing calibration models from data where the bulk of the samples are tightly clustered, with only a few samples exhibiting significant variation.

I’ll be there as well, presenting Tools for Multivariate Calibration Robustness Testing with Observations on Effects of Data Preprocessing. We all want calibration models that are robust, and thus, have good longevity. But how do you tell how brittle a model is? This talk demonstrates some tools for assessing model performance in the face of changes in the samples and instruments.

I’m also presenting a poster, Properties of PLS, and Differences between NIPALS and Lanczos Bidiagonalization. I’ll write about this a little more in my next post, but suffice it to say that there is a bit of controversy of late about various algorithms for Partial Least Squares Regression and the residuals they generate.

Eigenvector is of course proud to be a sponsor of CAC. We are sponsoring the Best Poster Contest, and will present the winner with $500USD (about 322€ today). I personally really like poster sessions. Its a great time to really talk with people about their research, and its generally much more of an exchange of scientific ideas than a talk, which are primarily one-way communications.

So, if you are going to CAC, look us up. Jeremy and I are always happy to answer questions about our products and services, and are always looking for user input on features for PLS_Toolbox, Solo, etc.

See you at CAC!

BMW

Events

Category Archives: Chemometrics

Jul 8, 2009

Jun 27, 2009

Jun 11, 2009

Jun 7, 2009

Jun 4, 2009

Jun 4, 2009

May 29, 2009

May 28, 2009

May 26, 2009

Apr 17, 2009

Apr 9, 2009

Oct 20, 2008

Sep 28, 2008

Jul 16, 2008

Jul 16, 2008

Jul 4, 2008

Jul 4, 2008

Jul 3, 2008

Jun 24, 2008

Jun 23, 2008