Hey lazymastodon, I have a linear algebra question.

So I've been thinking a bit about principle component analysis as of late. The way to find the vector of most variance in a multidimensional dataset is to put every datapoint in a column matrix, multiply that matrix by its transpose, and find the eigenvectors of the resulting square matrix.

Here's my question: I don't have a good intuition for what "multiply the matrix by its transpose" is doing. That compares every point to every other point by multiplying the same-dimension components together and summing the result across dimensions, but like... Why does that result in an interesting matrix instead of a pile of noise?

Okay, so I figured this part out: a matrix multiplied by its transposition is a covariance matrix. By which I mean: the higher the value in a given (row, col), the more data in those axes were correlated.

en.wikipedia.org/wiki/Covarian

To simplify, consider a 3x3 matrix `A` and multiply `A` by `transpose(A)`.

What each cell of the result is telling you is how likely it is that when you change the value on the row axis, the value on the column axis changes the same way. So the diagonal will always be large, because data on an axis will always correlate with itself (i.e. when you change the value of `x`, the value of `x` changes in *exactly* the same way, `x*x = x^2`), but cell 0,2, for example, tells you how much changing x causes z to change the same way (if it's the same value as cell 0,0, then the points lie on a diagonal in the xz-plane: changing `x` causes the exact same change in `z`).

I still need to cogitate a bit on why the eigenvector with the largest eigenvalue of this matrix is the axis along which the data has the highest variance in the original coordinate space.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.