@r2qo machine learning is far too vague a field to ask any questions about generally like that.. You'd have to specifically ask about a class of algorithms, like neural networks or Bayesian networks, if you want to get a coherent answer to that.
@r2qo gradient descent, even in its simplest form such as the hill climbing algorithm, would not be too susceptible to floating point error unless the optimal value is represented by an extremely steep and narrow peak (so narrow as to be on the order of size as the error itself), which is rarely the case. There is nothing cummulative about the error when optimizing a single parameter with gradient descent and when you do it across many the errors dont accumulate usually as they can just as likely cancel out. Again the assumption being that the ideal target lies on a curve in the multidimensional space of the given parameters that is not exceptionally steep and narrow.
@r2qo As a general rule outside of the one case I mentioned the floating point error will never produce a nonsensical error as its effect on the output is as large as the error itself, which is miniscule. so it may effect your error by 0.000000001% or something silly.
Its best to think of gradient descent with many parameters as simply a multi-dimensional topography and it becomes clear that the error would have no appreciable effect on ascending towards a local maxima/minima
@freemo
Thanks for the propagation part, I would look into my model and data set more carefully and choose the gradient accordingly.
Floating point error is not a given value and gets propagated, but a structural problem that a model would face on its way evolving. This could make some part of the computation become nonsense, and the result not reliable. Moremore, we accept the model to give out wrong answers in an 'acceptable' rate, so that we would miss our chance to fix the real problem.