@philipwhite It's a way of having a generalized transformation abstraction, in an environment where matrix multiplication is cheap (and higher level abstraction are not available). It's pretty much the same rational as grouping rotation, scaling, skewing together. In the example, all the matrices can be combinations of various basic transformations in any order, and it all can be chained together easily with just one operation, without you additionally having to keep track of a set of vectors, and apply them through a different operation in correct order (because subsequent rotation/scale/skew affects translation, you can't just add one at the very end).
As a simple example consider these transformations:
translate right 1 unit,
scale by 2,
translate left 1 unit,
scale by 0.5
With separate translation vector it will look like this:
scale0.5 * (scale2 * (pos + right1) + left1)
With translation as matrix:
scale0.5 * left1 * scale2 * right1 * pos
Much easier to comprehend, but also if you give these some descriptive names, and later (even at runtime) want to change one transformation with another (or a combination of others), the expression itself doesn't change, so it is a generic expression. In the former case, especially if you want to add some extra transformations in the middle, it can get really hairy (and impossible at runtime).