I got bored so i wanted to see if we could remember how to represent a multilayer perceptron using linear algebra. I think i summed it up fairly well.
@tatzelbrumm Agreed its not purely linear. The step where the transfer function is applied is of course a non-linear step. It represents well in LA but you simply cant represent a whole network as a single matrix is all, you have to cycle between three steps.
Basically each layer winds up becoming its own matrix. The connections, multiplication by the weight, and summation steps are all LA.
@tatzelbrumm Also side note, a MLP does not **need** to have a non-linear transfer function. It can be linear and some linear functions can even approximate it well enough to be usable.
Check out rectified linear unit (ReLU).. if used then the above equations i showed that is an iteration of three steps applied per layer becomes linear and thus no need to iterate per layer, you can reduce the whole network to a set of matrices and operate on them entirely in LA. I didnt go that far though.