@r2qo gradient descent, even in its simplest form such as the hill climbing algorithm, would not be too susceptible to floating point error unless the optimal value is represented by an extremely steep and narrow peak (so narrow as to be on the order of size as the error itself), which is rarely the case. There is nothing cummulative about the error when optimizing a single parameter with gradient descent and when you do it across many the errors dont accumulate usually as they can just as likely cancel out. Again the assumption being that the ideal target lies on a curve in the multidimensional space of the given parameters that is not exceptionally steep and narrow.
@freemo
Thanks for the propagation part, I would look into my model and data set more carefully and choose the gradient accordingly.
Floating point error is not a given value and gets propagated, but a structural problem that a model would face on its way evolving. This could make some part of the computation become nonsense, and the result not reliable. Moremore, we accept the model to give out wrong answers in an 'acceptable' rate, so that we would miss our chance to fix the real problem.
@r2qo As a general rule outside of the one case I mentioned the floating point error will never produce a nonsensical error as its effect on the output is as large as the error itself, which is miniscule. so it may effect your error by 0.000000001% or something silly.
Its best to think of gradient descent with many parameters as simply a multi-dimensional topography and it becomes clear that the error would have no appreciable effect on ascending towards a local maxima/minima
@freemo
Thanks for the reply. I am pretty surprised😂. I am just starting to learn about this field.
Gradient-based approches involves lots of floating point arithmetic, and that will certainly hit a floating point error in computers. There is also propagation of uncertainty. However, I don't see people wory about it. That gets me confused.