@freemo Without reading the scans of your notes in detail, because the resolution is inadequate, and you really have no excuse not to typeset this in LaTeX, anyhow:
You can't represent a multilayer perceptron by linear algebra only, because you could then reduce everything to an equivalent
(number of inputs)x(number of outputs) matrix.
So, in the spirit of Helmut Kohl:
"Entscheidend ist, was hinten rauskommt",
[https://en.wikiquote.org/wiki/Helmut_Kohl]
there must be some sort of sigmoid [https://en.wikipedia.org/wiki/Sigmoid_colon]
somewhere in your equations to make your algebra non-linear.
Where?
@tatzelbrumm Also side note, a MLP does not **need** to have a non-linear transfer function. It can be linear and some linear functions can even approximate it well enough to be usable.
Check out rectified linear unit (ReLU).. if used then the above equations i showed that is an iteration of three steps applied per layer becomes linear and thus no need to iterate per layer, you can reduce the whole network to a set of matrices and operate on them entirely in LA. I didnt go that far though.
@tatzelbrumm Agreed its not purely linear. The step where the transfer function is applied is of course a non-linear step. It represents well in LA but you simply cant represent a whole network as a single matrix is all, you have to cycle between three steps.
Basically each layer winds up becoming its own matrix. The connections, multiplication by the weight, and summation steps are all LA.