How machine learning folks eliminate floating point error? Or, how can they not care about this?

@r2qo machine learning is far too vague a field to ask any questions about generally like that.. You'd have to specifically ask about a class of algorithms, like neural networks or Bayesian networks, if you want to get a coherent answer to that.

@freemo
Thanks for the reply. I am pretty surprised😂. I am just starting to learn about this field.
Gradient-based approches involves lots of floating point arithmetic, and that will certainly hit a floating point error in computers. There is also propagation of uncertainty. However, I don't see people wory about it. That gets me confused.

@r2qo gradient descent, even in its simplest form such as the hill climbing algorithm, would not be too susceptible to floating point error unless the optimal value is represented by an extremely steep and narrow peak (so narrow as to be on the order of size as the error itself), which is rarely the case. There is nothing cummulative about the error when optimizing a single parameter with gradient descent and when you do it across many the errors dont accumulate usually as they can just as likely cancel out. Again the assumption being that the ideal target lies on a curve in the multidimensional space of the given parameters that is not exceptionally steep and narrow.

@freemo
Thanks for the propagation part, I would look into my model and data set more carefully and choose the gradient accordingly.
Floating point error is not a given value and gets propagated, but a structural problem that a model would face on its way evolving. This could make some part of the computation become nonsense, and the result not reliable. Moremore, we accept the model to give out wrong answers in an 'acceptable' rate, so that we would miss our chance to fix the real problem.

@r2qo As a general rule outside of the one case I mentioned the floating point error will never produce a nonsensical error as its effect on the output is as large as the error itself, which is miniscule. so it may effect your error by 0.000000001% or something silly.

Its best to think of gradient descent with many parameters as simply a multi-dimensional topography and it becomes clear that the error would have no appreciable effect on ascending towards a local maxima/minima

@r2qo most ML "experts" aren't that great at actual software development. it's unlikely that they're even aware there is an issue, nor could they tell you what exact effect it would have since they're mostly just plugging black boxes together

@FBICatgirl
That's reasonable, but sadly that does not sounds scientific to me. It sounds more like a web engineer lol

@r2qo that's what I'm saying, the industry is rife with grifters
Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.