So far, the architecture is 728x128x32x10, learning rate of 0.001 learning rate, sigmoid activation for all neurons, and using stochastic gradient descent. It's definitely lowering the error over time, but it's at such a slow rate that it'd take ~2 months to actually start guessing inputs correctly (which is definitely not ideal).