Eigenvector University returns to Seattle, USA May 12-16, 2025 Complete Info Here!

Category Archives: Software

Software news and issues.

EigenGuys at FACSS in Reno

Oct 20, 2008

This was the first year in a long time that I didn’t make it to FACSS, but that doesn’t mean that Eigenvector wasn’t there. The EigenGuys attending included Neal Gallagher, Jeremy Shaver, Chuck Miller and Scott Koch.

As usual, EVRI taught some courses: Neal took the lead on our popular Chemometrics without Equations, and introduced a new course, Advanced Chemometrics without Equations. As its name implies, ACWE explains concepts such as advanced preprocessing and variable selection in words and pictures rather than equations.

The EigenGuys also gave a number of talks. Jeremy presented “Making Do-Weighted regression models for use with less-than-perfect data.” This work describes a strategy for developing models based on historical data when the most interesting or critical data is underrepresented.

Chuck presented our still-not-quite-complete study of preprocessing and calibration transfer methods, “Combining Calibration Transfer and Preprocessing: What methods, What Order?” The good news is, as far as the examples we have go, it doesn’t matter if you preprocess then do calibration transfer or the other way around. (If you have data where you think it makes a real difference, please drop a line.) Chuck’s other offering, “Analytical Chemistry and Multi-Block Modeling for Improved NIR Spectral Interpretation,” demonstrated how PLS2 can be used to analyze data from multiple analytical instruments in order improve understanding. This deeper knowledge can be used in turn to improve model performance.

Scott headed up the trade show aspect of the conference, manning our booth. Scott’s main task was doing demos of our new PLS_Toolbox 5.0, which was just released last week. Look for Solo 5.0 shortly!

BMW

Software as Automobile

Jul 22, 2008

If your chemometrics software package were an automobile, what would it look like?

Sorry, couldn’t resist! 🙂

BMW

Chemometrics Software Prices

Jun 13, 2008

I was doing a little market research the other day, trying to find out what our competitors charge for their software. As it turns out, it’s somewhat difficult to get prices for several of them.

EVRI publishes its price list, as does Infometrix. So if you want to find out the price of PLS_Toolbox, or the price of Pirouette, it’s just a click away. But if you want to get a price on Unscrambler from CAMO, SIMCA-P+ from MKS/Umetrics, or GRAMS from Thermo Scientific, that’s a little tougher. You have to write for a quote.

So, on Wednesday, June 11, I wrote for quotes on Unscrambler and SIMCA-P+. I’m still waiting to hear back. I’ll let you know if/when I get my quotes.

But, it’s my understanding that SIMCA-P+, Unscrambler, and GRAMS are all priced similarly to Pirouette, which is $4500, or more. We think that our PLS_Toolbox and Solo products are a much better value.

If you already have MATLAB, (and more than one million people do), PLS_Toolbox is an absolute steal at $995. And if you don’t, Solo, at $1695, offers the point-and-click interfaces of PLS_Toolbox without requiring MATLAB. Both Solo and PLS_Toolbox are easy to use, offer sophisticated data preprocessing techniques, and many tools not found in other packages, such as PARAFAC and calibration transfer tools.

And with the current exchange rate of $1.00 = 0.65€, our prices are at historic lows for our European customers.

So if you’re in the market for multivariate software solutions, be sure to check out EVRI. It’s where the value is!

BMW

Network and Floating Licenses

Jun 12, 2008

The majority of the licenses we sell for PLS_Toolbox and our other products are single user licenses, though they might be more correctly called single computer licenses. Our license agreement, (which is based on the license agreement used by The MathWorks for their toolboxes), states “This license permits licensee to install and use one copy of the Program on a single computer.” We have no objection to multiple users accessing our software, provided that they do so at the single computer upon which it is installed. Though the license does not expressly allow this, we also don’t object if single users make a copy of our software for their laptop or home computer. We draw the line, however, at multiple users accessing our software from multiple computers. This is clearly outside the single user license.

We do have a cost effective solution, however, for the multiple users/multiple computers situation: network/floating licenses. EVRI has developed its own license server that can be used to limit the number of copies of PLS_Toolbox and Solo currently in use. Thus, a group of users can share a number of licenses. This works well if most users in the group need PLS_Toolbox or Solo only occasionally.

We price our floating licenses at 1.5 times the cost of a single user license, as given on our price list. Thus, a single-seat industrial floating license for PLS_Toolbox is $1495, and a single-seat industrial floating license for Solo is $2545. Multi-seat license discounts begin at 3 seats.

If you have questions about our floating/network licenses or license server, drop me a line at bmw@eigenvector.com. We’d be happy to help you set things up so that more of your users could access our chemometric tools in a cost effective way.

BMW

Pietro Cosentino wins Best Presentation Prize at CMA4CH 2008

Jun 5, 2008

The Multivariate Analysis and Chemometrics Applied to Environment and Cultural Heritage conference, CMA4CH, was held on Ventotene Island, Italy, June 1-4, 2008. This somewhat specialized conference focuses on how multivariate techniques can be used to improve understanding of the environment and cultural artifacts.

EVRI was pleased to be a sponsor of CMA4CH, and provided a copy of Solo 4.2 for the Best Presentation. The prize was won by Pietro Cosentino for “Identification of precious artefacts and stones: the sonic imprint.” The paper presents a new non-invasive technique for fingerprinting objects, such as stones and statues, by measuring their response to vibrations imposed by an external “tweeter.” Co-authors include P. Capizzi, G. Fiandaca, R. Martorana, P. Messina and I. Razo Amoroz, all of the University of Palermo.

Congratulations to Pietro and co-workers on a very interesting piece of work! We trust that they will find Solo quite useful was they go forward and test their method on new samples.

BMW

EigenU 2008 Best Posters

May 11, 2008

This year the Eigenvector University poster session was held in conjunction with the Center for Process Analytical Chemistry’s spring meeting dinner event. The EigenU crowd hopped a bus at the WAC and was whisked out to the University of Washington, where they hooked up with CPAC at the UW Club. Almost 20 posters were presented, about two-thirds of which came from UW/CPAC students, with the balance from EigenU attendees. Posters were judged by the Eigenvectorians in attendance (Neal, Jeremy, Rasmus, Willem, Scott, Chuck, Bob, and myself).

Lead authors for the winning posters were Paroma Chakravarty of the University of Minnesota and Rebecca Milczarek of the University of California at Davis. Each received Apple iPod nanos for their winning efforts.

Paroma’s contribution, “Determination of phase transformation in pharmaceutical solid dosage form (tablet) by multivariate analysis,” described how Principal Components Analysis was used to analyze Raman spectroscopic data to monitor phase transformation between hydrates of thiamine hydrochloride. Her co-authors included Marc Champagne and Leslie King of Eli Lilly and Raj Suryanarayanan of University of Minnesota. Ms. Chakravarty used the PCA routines from PLS_Toolbox, and provided a very succinct interpretation of the results.

Rebecca’s poster, “Assessment of tomato pericarp mechanical damage using multivariate analysis of magnetic resonance images,” demonstrated how MRIs can be used to detect bruising. Bruised tomatoes disintegrate during peeling, leading to loss of product. Co-authors included Mikal E. Saltveit of UC Davis and T. Casey Garvey of ConAgra Foods. The poster was presented by co-author Professor Michael J. McCarthy of UC Davis. Ms. Milczarek used PLS with the image tools from MIA_Toolbox to analyze the data.

Many thanks to all who presented their work or attended the poster session!

BMW

Compiling PLS_Toolbox

Apr 17, 2008

The MathWorks MATLAB® Compiler™ can be used to develop stand-alone applications based on MATLAB code. The current version of the Compiler supports the full MATLAB language, objects (including custom class objects), most of the TMW MATLAB toolboxes and user-developed GUIs. As such, its a pretty handy tool for creating custom applications that can be redistributed. The resulting applications run on MATLAB libraries, essentially a “headless MATLAB”, and have the same stability and functionality of the code running under MATLAB.

We’ve been getting an increasing number of inquiries about compiling PLS_Toolbox functions. Yes, it is possible to compile PLS_Toolbox functions, they are completely compatible with the MATLAB Compiler (this is how we create our stand-alone products). However, our standard end-user license agreement expressly forbids recompilation and/or redistribution of PLS_Toolbox .m files. And, you must have a special license code from us to make it work.

We offer a variety of recompilation licenses for parts of PLS_Toolbox depending on the functions required and the distribution of the final product. The license required, and the price we charge, is determined by the answers to the following questions.

  • Where and how will the compiled application be distributed? (Inside one company, as part of a product, etc.)
  • How many users will be using the end application?
  • What is the purpose of the application?
  • Does the application make use of any of the graphical user interfaces of PLS_Toolbox? (e.g. Analysis, PlotGUI, Genalg, Browse)

If getting PLS_Toolbox models on-line is your goal, compiling our predictor functions is one way to do it. We’re always happy to help users get more mileage out of their models. But note that there are other ways to get models on-line, including Model_Exporter and Solo_Predictor.

Using PLS_Toolbox code in your application can save hundreds of man-hours of development time compared to starting from scratch. And you know you can trust it because it has been extensively tested and updated by Eigenvector and used by the thousands of PLS_Toolbox users. Let PLS_Toolbox accelerate your next analytical application!

BMW

Student and Academic Licenses

Mar 25, 2008

NOTE: Current pricing can be found here.

I’ve recently fielded a couple of questions about software licenses for class room and academic research use. So I’d like to take this opportunity to outline our policy.

In addition to Industrial licenses for our PLS_Toolbox and Solo products, we also offer Student and Academic licenses, both fully functional.

Student licenses are free, and are actually extended demos, with a six month expiration. They are intended for use by students when taking a class with a chemometric component. Typically, course instructors send us a list of students, with email addresses, who will be using these licenses. We set them up in our system, and send each of them an email with download instructions for the software. Note that, like demos, student licenses only allow download of the p-coded MATLAB m-files, so they can’t view the contents of the files. Aside from that, the software is the same as for our industrial and academic licenses.

Academic licenses are intended for long term academic research (such as thesis research), and are available at greatly reduced prices compared to our industrial rates (e.g. $395 for PLS_Toolbox 4.2 versus $995 industrial). These licenses don’t expire, include one year of support, maintenance and upgrades, and contain the source m-files.

The PLS_Toolbox manual, included as a .pdf with each software download, contains extensive tutorials on linear algebra, use of MATLAB, and of course all the major chemometric techniques, such as PCA, PCR, PLS, multi-way methods, etc. It makes an excellent text, and its free!

Let us know if you plan to teach a course with a chemometric component. We’d be happy to work with you to set your class up.

BMW

Support for Mac OS, Linux and Unix

Mar 12, 2008

One of the advantages of working with MATLAB® is that it is available on platforms besides just MS Windows. MATLAB is available for Mac OS, Linux and Unix. Because of that, PLS_Toolbox, MIA_Toolbox and EMSC_Toolbox also run on all these platforms. MATLAB makes these products work across platforms almost seamlessly. And because us Eigenvectorians also use Windows, Mac and Linux, we catch and fix any places where the translation between platforms isn’t perfect.

We’ve also developed our stand alone products Solo and Solo+MIA for both Mac OS and Windows. They’re available to download for trial or purchase right now!

As far as I know, this makes us the only maker of dedicated chemometrics software with true cross-platform solutions. Why do we do it? Well, it could have something to do with the fact that the president of this company really likes his Mac. But we believe you should have a choice of computing platforms, and there is no reason why they shouldn’t all play together nicely. Mac OS, Windows, Linux or Unix–you can trust PLS_Toolbox!

BMW

…and another thing about Model_Exporter

Feb 28, 2008

We recently got a chance to test the .m file output of Model_Exporter with National Instruments’ LabVIEW. The Full version of LabVIEW includes MathScript, which is quite compatible with MATLAB for basic calculations. We’ve just verified that PLS_Toolbox and Solo models will work with LabVIEW applications after exporting them with Model_Exporter. You don’t have to have a copy of MATLAB present to run them.

This is good news for LabVIEW users, who can now take full advantage of the power of PLS_Toolbox/Solo for their on-line applications!

So how easy is it?

Simple! Once you have a model in the Analysis GUI, just select File/Export Model/To Predictor/Predictor M-file (see image below). Save the model to an m-file.

Next, set up LabVIEW to create a variable called “x” in MathScript containing the measured data and call the exported m-file to make the prediction. The result would be several variables in LabVIEW including the predicted value, along with T2 and Q values, socres, contributions, etc. Any of these could then be plotted, or used for whatever control action is desired.

With Model_Exporter, that’s all it takes to get your PLS_Toolbox/Solo models running in LabVIEW!

BMW

moz-screenshot-85.jpg

Model Transparency, Validation & Model_Exporter

Feb 20, 2008

With the advent of the US Food and Drug Administration’s (FDA) Process Analytical Technology (PAT) Initiative the possibilities for putting multivariate models on-line in pharmaceutical applications increased dramatically. In fact, the Guidance for Industry on PAT lists Multivariate tools for design, data acquisition and analysis explicitly as PAT Tools. This opens the door for the use analytical techniques which rely on multivariate calibration to produce estimates of product quality. An example of this would be using NIR with PLS regression to obtain concentration of API in a blending operation.

That said, any multivariate model that is run in a regulated environment is going to have to be validated. I found a good definition of validate on the web: To give evidence that a solution or process is correct. So how do you show that a model is correct? It seems to me that the first step is to understand what it is doing. A multivariate model is really nothing more than a numerical recipe for turning a measurement into an answer. What’s the recipe?

Enter Model_Exporter. Model_Exporter is an add-on to our existing multivariate modeling packages PLS_Toolbox and Solo. Model_Exporter takes models generated by PLS_Toolbox and Solo and turns them into a numerical recipe in an XML format that can be implemented in almost any modern computer language. It also generates m-scripts that can be run in MATLAB or Octave, and Tcl for use with Symbion.

But the main point here is that Model_Exporter makes models transparent. All of the mathematical steps (and the coefficients used in them), including preprocessing, are right there for review. Is the model physically and chemically sensible? Look and see.

The next step in validation is to show that the model behaves as expected. This would include showing that, once implemented, the model produces the same results on the training data as the software that produced the model. One should also show that the model produces the same (acceptable) results on additional test sets that were not used in the model development.

What about the software that produced the model to begin with? Should it be validated? Going back to the definition of validate, that would require showing that the modeling software produced answers that are correct. OK, well, for PLS regression, correct would have to mean that it calculates factors that maximize the covariance between scores in X and scores in Y. That’s great, but what does it have to do with whether the model actually performs as expected or not? Really, not much. Does that mean its not important? No, but assuring software accuracy won’t assure model performance.

Upon reading a draft of this post, Rasmus wrote:

Currently software validation focuses on whether the algorithm achieves what’s claimed, e.g. that a correlation is correctly calculated. This is naturally important and also a focus point for software developers anyhow. However, this sort of validation is not terribly important for validating that a specific process model is doing what it is supposed to. This is similar to thoroughly checking the production facility for guitars in order to check that Elvis is making good music. There are so many decisions and steps involved in producing a good prediction model and the quality of any correlation estimates in the numerical algorithms are of quite insignificant importance compared to all the other aspects. Even with a ‘lousy’ PLS algorithm excellent models could be made if there is a good understanding of the problem.

So when you start thinking about preprocessing options and how many ways there are to get to models with different recipes but similar performance, and also how its possible by making bad modeling choice to get to a bad model with software that’s totally accurate, it’s clear that models should be validated, not the software that produces them. And that’s why Model_Exporter is so useful, it makes models transparent, which simplifies model validation.

PLS_Toolbox/Solo Maintenance Program

Feb 19, 2008

You may have noticed that Eigenvector has recently changed the way software upgrades are handled. Our old model was based on the version number of the software. With PLS_Toolbox, we had always charged for every other upgrade of version number of 0.1 or greater. For instance, if you bought PLS_Toolbox 1.5, you got 2.0 for free, but had to pay to upgrade to 2.1. Once upgraded to 2.1, you got 3.0 for free, and so on. This was a little confusing to users at times, because the version didn’t go up in even increments, and the time between upgrades was variable as well (see A History of PLS_Toolbox).

We’re now moving to a maintenance contract model that specifies a time period for support, including free upgrades and maintenance releases. Why the change? Well, first off, we’re now planning on releasing upgrades on a more regular schedule, approximately twice a year. We have a long list of new features and methods that we’ll be adding to PLS_Toolbox and Solo, and we want to get those to you as soon as possible. So version number changes are going to accelerate. Secondly, we have a large number of users who would prefer a model that provided predictable costs. This way they can budget for continued upgrades and maintenance. Beyond that, The MathWorks uses this model for MATLAB and users are accustomed to it.

So how are we going to price it? The general rule is that maintenance will cost approximately 20% of the new license cost for industrial users ($199/year for single PLS_Toolbox licenses) and 25% a year for academics ($99/year). Users can purchase yearly maintenance by logging into their account and selecting it from the list under the Buy/Upgrade tab.

For industrial users, an upgrade under the old system ($495) is the same cost as 2.5 years of maintenance under the new system. But with our more rapid pace of development, we expect you’ll see at least 4 significant upgrades in that time. So the value to our customers should be better than ever.

The policy for Solo will mirror the policy for PLS_Toolbox.

Please write if you have questions!

BMW

Some Thoughts on Freeware in Chemometrics

Feb 6, 2008

Once again there is a discussion on the chemometrics listserv (ICS-L) concerning freeware in chemometrics. There have been some good comments, and its certainly nice to see some activity on the list! I’ll add my thoughts here.

On Feb 5, 2008, at 3:39 PM, David Lee Duewer wrote:
I and the others in Kowlski’s Koven built ARTHUR as part of our PhuDs; it was distributed for a time as freeware. It eventually became semi-commercial as a community of users developed who wanted/needed help and advice. Likewise, Barry’s first PLS_Toolbox was his thesis and was (maybe still is?) freeware.

No, its not freeware, but it is still open source. One of my pet-peeves is that “freeware” and “open-source” are often used synonymously, but they aren’t the same thing.

PLS_Toolbox is open source, so you can see exactly what its doing (no secret, hidden meta-parameters), and you can modify it for your own uses. (Please don’t ask us to help you debug your modified code, though!) You can also compile PLS_Toolbox into other applications IFF (if and only if) you have a license from us for doing so. And of course PLS_Toolbox is supported, regularly updated, etc. etc. If something doesn’t work as advertised, you can complain to us and we’ll fix it, pronto.

I think we occupy a sweet spot between the free but unsupported (must rely on the good will of others) model and the commercial but closed source (not always sure what its doing and can’t modify it) model.

OK, end of commercial!

But the problem with freeware projects is that there have to be enough people involved in a quite coordinated way in order to reach the critical mass required to make a product that is very sophisticated. Yes, its possible for a single person or a few people to make a bunch of useful routines (e.g. PLS_Toolbox 1.4, ca 1994). But a fully gui-fied tool that does more than a couple things is another story. PLS_Toolbox takes several man-years per year to keep it supported, maintained and moving forward. And if it wasn’t based on MATLAB, it would be considerably more.

On Feb 5, 2008, at 4:17 PM, Scott Ramos wrote:
… the vast majority of folk doing chemometrics fall into Dave’s category of tool-users. This is the audience that the commercial developers address. Participants in this discussion list fall mostly into the tool-builder category. Thus, the discussion around free or shareware packages and tools is focused more on this niche of chemometricians.

And that’s the problem. Like it or not, chemometrics is a bit of niche market. So whether you can get enough people together to make freeware that is commercial-worthy, that tool-users are willing to rely on, is going to be even tougher than for other, broader markets. The most successful opensource/freeware projects that I’m aware of are tools for software developers themselves: tools by software geeks for software geeks. Tools for CVS are a great example of this (like the copy of svnX that I use, and hey, WordPress, which I’m using to write this blog).

MATLAB is interesting in that it occupies a middle ground, it is both a development environment and an end-user tool. You can pretty much say the same for PLS_Toolbox.

On Feb 5, 2008, at 2:32 PM, Rick Dempster wrote:
I was taught not to reinvent the wheel many years ago and that point seems to have stuck with me.

That’s good practice. But it seems to me that a substantial fraction of the freeware effort out there really is just reinventing things that exist elsewhere. The most obvious example is Octave, which is a MATLAB clone. I notice that most of the freeware proponents out there have .edu and .org email addresses, and likely don’t have the same perspective as most of us .com folks do on what its worth doing ourselves versus paying for. And they might get credit in the academic world for recreating a commercial product as freeware:

On Feb 5, 2008, at 2:50 PM, Thaden, John J wrote:
…but I can’t help dreaming of creating solutions to my problems that I can also share with communities facing similar problems — part of this is more than a dream, it’s the publish-or-perish dictum of academia…

Isn’t that what Octave is really all about? At this point it is just starting to get to the functionality of the MATLAB 5.x series (from ~10 years ago?). This is pretty obvious if you read Bjørn K. Alsberg and Ole Jacob Hagen, “How octave can replace MATLAB in chemometrics“, ChemoLab, Volume 84, pps 195-200, 2006. I’d like Octave to succeed, heck, we could probably charge more for PLS_Toolbox if people didn’t have to pay for MATLAB too. But at this point using Octave would be like writing with charcoal from my fireplace because I didn’t want to pay for pencils. The decrease in productivity wouldn’t make up for the cost savings on software. I don’t know about some of the other freeware/opensource packages discussed, such as R, but one should think hard about cost/productivity trade-offs before launching into a project with them.

Thanks for stopping by!

BMW

A History of PLS_Toolbox

Jan 24, 2008

I started graduate school at the University of Washington Department of Chemical Engineering in the Fall of 1985. Sometime around Fall of 1986 somebody showed me MATLAB. Wow. That was the last day I ever wrote anything in Basic or Fortran–it was MATLAB from there on out. In 1987 I finished my MS in ChemE and started on a new project which became my dissertation, “Adapting Multivariate Analysis for Modeling and Monitoring Dynamic Systems“. In order to do this research I needed to develop multivariate analysis routines and process simulations, so MATLAB was the logical tool of choice.

At some point late in 1989 I realized that I had created a significant number of routines that might be of use to other researchers. I collected these functions, wrote sensible help files for them, and wrote a brief manual. I’d been working a lot with Partial Least Squares (PLS) regression, and the bulk of the functions I’d created related to that, so I decided (for better or worse) to call it PLS_Toolbox. Why the underscore? Honestly, I don’t remember. It may have had to do with inability of some operating systems to deal with path names that included whitespace. And I didn’t like running it together, PLSToolbox, because that reads like PL Stoolbox, and I didn’t like the connotation.

So in the fall of 1989 I printed up some manuals for PLS_Toolbox 1.0 and started distributing it around the Chemical Engineering Department and the Center for Process Analytical Chemistry. The rest, as they say, is history. After graduating from UW in 1991, I continued to update PLS_Toolbox and distribute it under the company Eigenvector Technologies. Battelle Pacific Northwest National Laboratory, my employer, had no interest in it. So I worked on it evenings and weekends and continued to release updates.

I founded Eigenvector Research, Inc. with Neal Gallagher on January 1, 1995, though PLS_Toolbox still came out under Eigenvector Technologies until version 2.0. A complete list of releases is given below.

PLS_Toolbox 1.0 late 1989 or early 1990
PLS_Toolbox 1.1 1990
PLS_Toolbox 1.2 1991
PLS_Toolbox 1.3 1993
PLS_Toolbox 1.4 1994 (July)
PLS_Toolbox 1.5 1995 (July-added author Neal B. Gallagher)
PLS_Toolbox 2.0 1998 (April-first version under Eigenvector Research)
PLS_Toolbox 2.1 2000 (November)
PLS_Toolbox 3.0 2002 (December–added authors Rasmus Bro and Jeremy M. Shaver)
PLS_Toolbox 3.5 2004 (August–added authors Willem Windig and R. Scott Koch)
PLS_Toolbox 4.0 2006 (May)
PLS_Toolbox 4.1 2007 (June)
PLS_Toolbox 4.2 2008 (January)

The release of PLS_Toolbox 4.2 this month brings the total number of versions to 13. We’ve been pretty stingy with our version numbers, changing them in increments of only 0.1 even when we added significant functionality. In other software companies PLS_Toolbox 4.2 would probably be known as version 9.1 or something like that.

Hope you enjoyed the history lesson, and thanks for checking in!

BMW

DataSet Object — Letter to MathWorks March 15, 2007

Apr 19, 2007

Dear [MathWorks]:

It seems to me that the best solution to the DataSet Object (DSO) conflict is that we merge the functions and create something that is compatible with both systems. The resulting code could be something that you manage provided that we have some say on what functionality gets added so that it doesn’t conflict with the way our functions work. There would be several advantages to this approach.

1) We have invested a considerable amount of time and effort in our DSO and believe that it is quite well thought out and well executed. It is quite full featured compared with TMW dataset (supports n-way data, multiple sets of labels, classes and axis vectors, axis titles, etc.), and has an editor that works with it. We overloaded over 25 functions that operate on it from things like concatenation to missing data replacement. In an effort to be compatible with yours, we have already added some of the functionality of your dataset.

2) If we were to coordinate on this and offer a single standard, there is more likelihood that it be adopted by other toolboxes, including many of your own, and other 3rd party products. This promises to improve the quality and user-friendliness of many MATLAB toolboxes.

3) TMW surely has additional ideas and resources that could be used to further enhance the DSO. One thing that comes to mind is an integrated editor that is built as part of MATLAB in Java rather than the .m code editor we have now.

4) As an internal data format, the DSO provides a target for data file translation from a wide variety of sources. (We find data translation to be the single biggest impediment to adoption of MATLAB in the chemical, biological, pharma and semiconductor fields.) If the DSO was a well known standard, this should foster the development of more translation routines and thus MATLAB licenses in additional fields.

Anyhow, we’d like to talk to you about this (just like we offered 6 years ago). If you are not interested in cooperating on this, it would at least be nice if you folks would change the name of your dataset to something else. I believe 2007a is still in prerelease, so you’d have time to do that.

Best regards….

BMW

Barry M. Wise, Ph.D.
President
Eigenvector Research, Inc.
3905 West Eaglerock Drive
Wenatchee, WA 98801

Phone: (509)662-9213
Fax: (509)662-9214
Email: bmw@eigenvector.com
Web: eigenvector.com

DataSet Object Conflict

Apr 18, 2007

In 2001 the software developers at EvRI (at the time that was Neal, Rasmus and myself) started thinking about how we could improve the organization of the data sets users analyze with PLS_Toolbox. The problem was that a typical data set contains a fair number of pieces, (including the data matrix, wavelength and/or time axes, labels on variables and/or samples, etc.) and they were all floating around independently in the MATLAB workspace. After working for a while, it often happened that you wound up with mismatched pieces. For instance, after deleting a sample from the data matrix, you might forget to delete the corresponding labels, or delete the wrong one.

We considered a number of options. One of them was to create a MATLAB structure array for data sets with a convention on field names for the typical parts. The problem here, though, is that there is no way to really control what gets stuck in the fields, or even that the conventions on names be strictly followed. After MUCH discussion we decided that the best way to assure data set integrity was to create a custom class object. With a custom class object you can control how MATLAB functions act on it, including both your own functions and MATLAB built-in functions. (This is known as “overloading”.) Thus, you can program the tools so that you can’t, for example, associate a wavelength axis with 400 entries with a set of spectra that have 401 channels.

The obvious name for this custom class object was “DATASET”. We commonly refer to it around here as the DataSet Object or DSO.

It seemed to us that maintaining the integrity of data sets was a fairly general problem, not just something specific to chemometrics. Maybe this should be a general tool in MATLAB. Because of this, we decided to share our ideas with representatives of The MathWorks. We met with them in June of 2001. The general subject of the meeting, (which was attended by several software companies, instrument companies and end users) was using MATLAB and models developed in MATLAB in on-line applications. It would have made sense for TMW to maintain the DataSet object and take input from users so that it could become a general, well supported tool. We shared our code with TMW, along with our ideas about how the evolution of the DataSet should be managed. After a few follow on emails it became apparent that TMW wasn’t really interested, so we proceeded on our own.

In January of 2002 we made our DSO publicly available, and announced it in an issue of EigenNews. The DSO was then and still is a free download from Eigenvector. We’ve continued to develop it, carefully adding features and functionality. It now supports multi-way data, multivariate images, batch data of unequal lengths, “soft” deletion of data, meta-data, maintains a history of changes, etc. Essentially all of the upper level functions in PLS_Toolbox use the DSO, as input, output or internally. This has greatly improved the management of data in MATLAB, and we’ve built additional tools for working with DSOs such as the DataSet Editor, which is part of PLS_Toolbox.

This was all well and good until TMW released the “prerelease” version of R2007a. We started getting reports from some of our users that there was a problem. One of them in particular did a very good job of tracking down the problem and reporting it to us and to TMW. We first heard about it on March 1, 2007. The problem was that the Statistics Toolbox from TMW now included a custom class object called “DATASET”. Depending upon which one is first on the path, you get different behavior. If the Stats Toolbox is first on the path, almost all the upper level functions, including interfaces, in PLS_Toolbox won’t work. With a quick patch, we were able to make PLS_Toolbox work normally provided that it is first on the path. But this causes problems with the functions in the Stats toolbox that use TMW’s DATASET.

We started contacting TMW almost immediately in the hopes of resolving this issue, which had quickly become our biggest support problem. This includes the letter we sent them on March 15, which reiterated many of the points we’d made previously. After about 3 weeks of persistent pestering, I finally received a response from the lead developer of the Stats Toolbox in which he “apologized for not getting back to us sooner.” However, in the interim, MATLAB 2007a had gone from prerelease to release. Thus, our early March suggestion that TMW rename their object was met with “As you know we’ve already shipped, so renaming is not an option for us.”

We responded to that by, once again, suggesting that TMW adopt our standard and that we share stewardship of the code. The Stats Toolbox DATASET is very limited compared to EvRI’s DSO, and in fact, we quickly added the few features that their’s had but ours didn’t in order to impove compatibility. Its now been three weeks since we last suggested this and TMW hasn’t responded.

The good news for PLS_Toolbox users is, provided you have PLS_Toolbox first on the path, everything in it works normally. And because we’ve responded rapidly, the Stats Toolbox is almost unaffected. But we think TMW is missing a great opportunity here to develop a standard that would be of use across a wide variety of application areas. If you think so too, drop TMW a line and let them know.

Thanks!

BMW