**Ray Zhu** @r2qo@qoto.org · 2021-04-10T14:34:33Z

Ray Zhu @r2qo@qoto.org

Ray Zhu @r2qo@qoto.org

How machine learning folks eliminate floating point error? Or, how can they not care about this?

Apr 10, 2021, 14:34 · · · ·

**🎓 Doc Freemo 🇳🇱** @freemo@qoto.org · Apr 10, 2021, 14:46

**🎓 Doc Freemo 🇳🇱** @freemo@qoto.org · Apr 10, 2021, 14:46

Apr 10, 2021, 14:46

🎓 Doc Freemo 🇳🇱 @freemo@qoto.org

@r2qo machine learning is far too vague a field to ask any questions about generally like that.. You'd have to specifically ask about a class of algorithms, like neural networks or Bayesian networks, if you want to get a coherent answer to that.

**Ray Zhu** @r2qo@qoto.org · Apr 10, 2021, 15:09

**Ray Zhu** @r2qo@qoto.org · Apr 10, 2021, 15:09

Apr 10, 2021, 15:09

Ray Zhu @r2qo@qoto.org

@freemo
Thanks for the reply. I am pretty surprised😂. I am just starting to learn about this field.
Gradient-based approches involves lots of floating point arithmetic, and that will certainly hit a floating point error in computers. There is also propagation of uncertainty. However, I don't see people wory about it. That gets me confused.

**🎓 Doc Freemo 🇳🇱** @freemo@qoto.org · Apr 10, 2021, 15:14

**🎓 Doc Freemo 🇳🇱** @freemo@qoto.org · Apr 10, 2021, 15:14

Apr 10, 2021, 15:14

🎓 Doc Freemo 🇳🇱 @freemo@qoto.org

@r2qo gradient descent, even in its simplest form such as the hill climbing algorithm, would not be too susceptible to floating point error unless the optimal value is represented by an extremely steep and narrow peak (so narrow as to be on the order of size as the error itself), which is rarely the case. There is nothing cummulative about the error when optimizing a single parameter with gradient descent and when you do it across many the errors dont accumulate usually as they can just as likely cancel out. Again the assumption being that the ideal target lies on a curve in the multidimensional space of the given parameters that is not exceptionally steep and narrow.

**Ray Zhu** @r2qo@qoto.org · Apr 10, 2021, 15:31

**Ray Zhu** @r2qo@qoto.org · Apr 10, 2021, 15:31

Apr 10, 2021, 15:31

Ray Zhu @r2qo@qoto.org

@freemo
Thanks for the propagation part, I would look into my model and data set more carefully and choose the gradient accordingly.
Floating point error is not a given value and gets propagated, but a structural problem that a model would face on its way evolving. This could make some part of the computation become nonsense, and the result not reliable. Moremore, we accept the model to give out wrong answers in an 'acceptable' rate, so that we would miss our chance to fix the real problem.

**🎓 Doc Freemo 🇳🇱** @freemo@qoto.org · Apr 10, 2021, 15:35

**🎓 Doc Freemo 🇳🇱** @freemo@qoto.org · Apr 10, 2021, 15:35

Apr 10, 2021, 15:35

🎓 Doc Freemo 🇳🇱 @freemo@qoto.org

@r2qo As a general rule outside of the one case I mentioned the floating point error will never produce a nonsensical error as its effect on the output is as large as the error itself, which is miniscule. so it may effect your error by 0.000000001% or something silly.

Its best to think of gradient descent with many parameters as simply a multi-dimensional topography and it becomes clear that the error would have no appreciable effect on ascending towards a local maxima/minima

**FBICatgirl** @FBICatgirl@poa.st · Apr 10, 2021, 15:11

**FBICatgirl** @FBICatgirl@poa.st · Apr 10, 2021, 15:11

Apr 10, 2021, 15:11

FBICatgirl @FBICatgirl@poa.st

@r2qo most ML "experts" aren't that great at actual software development. it's unlikely that they're even aware there is an issue, nor could they tell you what exact effect it would have since they're mostly just plugging black boxes together