Eigenvector University Europe returns to Rome, Italy October 13-16, 2025

Beginning at the Beginning

Jul 7, 2025

At Eigenvector we’ve been teaching classes in chemometrics and machine learning since our founding more than 30 years ago, and I’ve been doing them slightly longer than that. Most of our classes are taught with equations that describe the various models we cover and how they are computed. (The exception of course is our Chemometrics without Equations series.) But of course in order for the equations to be helpful you actually need to understand them. It has been said that linear algebra is the language of chemometrics, and I think that applies to most data science fields. Certainly some calculus is required, but the most fundamental methods are succinctly expressed in linear algebra notation.

We’d really like our students to understand what is going on “behind the scenes” with the methods they are using, so we start most of our classes with the course we now call “Linear Algebra for Machine Learning and Chemometrics.” It covers the basics, including

  • Vector and matrix operations (addition, subtraction, multiplication)
  • Inner and outer products
  • Vector spaces, subspaces and null spaces
  • Projections onto vectors and subspaces
  • Basis sets, orthogonal and orthonormal matrices, linear independence
  • Gaussian elimination and solving systems of equations
  • Matrix inverses and least squares
  • Matrix rank, rank deficiency and ill-conditioned matrices
  • Singular Value Decomposition (SVD)
  • Pseudoinverses

Most college graduates from the hard sciences took a course in linear algebra that covered these topics, but if they were like the one I had, never related these concepts to actual practice. Given that we employ these methods to describe and manipulate real chemical data we have many examples of how the concepts are used! An example of that, where we illustrate an outer product as a concentration profile times a pure component spectrum, is shown below.

If you understand the concepts outlined above you’re in pretty good shape to start exploring multivariate methods and you’ll have an appreciation of how they work and, perhaps more importantly, why they sometimes fail. Rank, in particular, is such an important concept in modeling and data science that I don’t see how you could do a credible job of modeling without a solid grasp of it. Likewise for ill-conditioning, shown below.

We assign our Introduction to Linear Algebra as pre-course homework (reference below). It covers most of the material in our Linear Algebra short course, and includes exercises which can be followed in MATLAB. Please feel free to download and use it. If you want to have a good foundation for your data science education, begin at the beginning!

BMW

B.M. Wise and N.B. Gallagher, “An Introduction to Linear Algebra,” Critical Reviews in Analytical Chemistry28(1), pps 1-19, 1998.