Linear algebra is a field of mathematics that is universally agreed to be a prerequisite to a deeper understanding of machine learning.

Although linear algebra is a large field with many esoteric theories and findings, the nuts and bolts tools and notations taken from the field are practical for machine learning practitioners. With a solid foundation of what linear algebra is, it is possible to focus on just the good or relevant parts.

Linear algebra is a branch of mathematics, but the truth of it is that linear algebra is the mathematics of data. Matrices and vectors are the language of data.

Linear algebra is about linear combinations. That is, using arithmetic on columns of numbers called vectors and arrays of numbers called matrices, to create new columns and arrays of numbers. Linear algebra is the study of lines and planes, vector spaces and mappings that are required for linear transforms.

Linear Algebra and Statistics

Linear algebra is a valuable tool in other branches of mathematics, especially statistics.

The impact of linear algebra is important to consider, given the foundational relationship both fields have with the field of applied machine learning.

Some clear fingerprints of linear algebra on statistics and statistical methods include:

  • Use of vector and matrix notation, especially with multivariate statistics.

As you can see, modern statistics and data analysis, at least as far as the interests of a machine learning practitioner are concerned, depend on the understanding and tools of linear algebra.

Vector Spaces

A vector space consists of a set V (elements of V are called vectors), a field F (elements of F are called scalars), and two operations

  • An operation called vector addition that takes two vectors v, w ∈ V , and produces a third vector, written v + w ∈ V .
  1. Associativity of vector addition: (u + v) + w = u + (v + w) for all u, v, w ∈ V .

2. Existence of a zero vector: There is a vector in V , written 0 and called the zero vector, which has the property that u + 0 = u for all u ∈ V

3. Existence of negatives: For every u ∈ V , there is a vector in V , written −u and called the negative of u, which has the property that u + (−u) = 0.

4. Associativity of multiplication: (ab)u = a(bu) for any a, b ∈ F and u ∈ V .

5. Distributivity: (a + b)u = au + bu and a(u + v) = au + av for all a, b ∈ F and u, v ∈ V . 6. Unitarity: 1u = u for all u ∈ V .


Let A be an m×n matrix. The matrix transformation associated to A is the transformation

This is the transformation that takes a vector x in Rn to the vector Ax in Rm.

If A has n columns, then it only makes sense to multiply A by vectors with n entries. This is why the domain of T(x)=Ax is Rn. If A has n rows, then Ax has m entries for any vector x in Rn; this is why the codomain of T(x)=Ax is Rm.

The definition of a matrix transformation T tells us how to evaluate T on any given vector: we multiply the input vector by a matrix.