FAQ  Frequently Asked Questions
Issue:

Can I use multiple class sets (categorical variables) together in a SIMCA, PLSDA, or LDA model?
Possible Solutions:

In general, you can only operate on one categorical variable (or classset) at a time (in fact, our Analysis GUI requires you to use only one categorical variable class at a time).
You could build individual SIMCA models (PCA model built on a single class) for each of the levels (i.e. members) of each of the categorical variables. But SIMCA models are generally independent of one another (adding another member of class A does not change the model for class B) so there is no effect of using multiple categorical variables there.
For PLSDA and the MLR equivalent (which is Linear Discriminant Ananlysis, LDA), there are a couple of different scenarios depending on how many "levels" your categorical variables have and whether or not the categorical variables are "complementary" (e.g. just inverses of each other).
The key to understanding what is useful and what isn't is how categorical variables are encoded as a yblock for classification in PLSDA. For a single categorical variable, each class is encoded into a separate "true" or "false" column in the yblock:
A 1 0 A 1 0 A 1 0 B > 0 1 B 0 1 B 0 1(column 1 is "Is A", column 2 is "Is B") Thus, the following twolevel categorical variables, if combined, would be redundant and provide a trivial solution.
A C 1 0 1 0 A C 1 0 1 0 A C 1 0 1 0 B D > 0 1 0 1 B D 0 1 0 1 B D 0 1 0 1Using only one of these categories would give you the same answer as using both.
If you have two nontrivially different categorical variables (still two levels each), you can encode these similarly creating a fourcolumn yblock:
A C 1 0 1 0 A D 1 0 0 1 A C 1 0 1 0 B D > 0 1 0 1 B C 0 1 1 0 B D 0 1 0 1If you wanted to create this yblock in PLS_Toolbox, you would use the commandline function class2logical with each of the separate categorical variables and combine the results:
y = [class2logical(cat_AB) class2logical(cat_CD)](NOTE: PLSDA in the Analysis GUI automatically handles converting classes into logical yblocks. It is better to use this automatic management rather than a handconstructed yblock)
However, notice that the first two ycolumns are orthogonal to the second two ycolumns. This is a strong indication that using these two categorical variables together in a PLSDA or LDA model will be DETRIMENTAL. In general, you will almost ALWAYS get a better model using one categorical variable at a time. This is because they are often working "at odds" from each other (the information needed to separate A's from B's is quite different than that needed to separate C's from D's) and forcing the model to do more than one thing at a time.
Finally, if any of your categorical variables has more than two levels, these get encoded as an ncolumn yblock (e.g. 3 levels yields 3 columns) and, although the same "trivial" vs. "nontrivial" rules for multiple categorical variables still hold, it is far less likely you will get any advantage in combining multiple categories.
A 1 0 0 A 1 0 0 B 0 1 0 B > 0 1 0 C 0 0 1 C 0 0 1You can imagine that combining this threecolumn yblock with another 2 or three column yblock will only give the model more complexity to handle.
Still having problems? Check our documentation Wiki or try writing our helpdesk at helpdesk@eigenvector.com