Tricky bit is getting the architecture and settings done in such a way that it can train against the MNIST dataset within a sane amount of time on my VPS.
So far, the architecture is 728x128x32x10, learning rate of 0.001 learning rate, sigmoid activation for all neurons, and using stochastic gradient descent. It's definitely lowering the error over time, but it's at such a slow rate that it'd take ~2 months to actually start guessing inputs correctly (which is definitely not ideal).
So far, the architecture is 728x128x32x10, learning rate of 0.001 learning rate, sigmoid activation for all neurons, and using stochastic gradient descent. It's definitely lowering the error over time, but it's at such a slow rate that it'd take ~2 months to actually start guessing inputs correctly (which is definitely not ideal).