@Kiloku @eb If you use a hammer to eat soup you generally don't end up with good results. 
If you want to sum numbers use the sum function, not one that predicts the next most probable token. I dislike this type of bashing of LLMs because it's trivial to dismiss (ok they can't do trivial maths, but they can write an entire piece of software for me). There are much more risky outputs that could be used as an example. Funky Excel formulas have always existed...
@fcalva @nicolaromano @Kiloku @eb The frightening thing is that they are generating functional software at this point. Idk how much handholding they need in the present day (because I actually enjoy what I do and I have zero interest in crowdfunding the Torment Nexus) but it started polluting the tech-adjacent maker side of YouTube almost immediately so I've watched multiple people who clearly aren't experienced programmers, but are persistent enough to go back and forth with an earlier, jankier version of ChatGPT to eventually end up with a functional program, and I'm lead to believe that some of the more code-focused models/platofrms have gotten a lot less hand-holdy.
I'm just waiting to see how much more data breach coverage costs next time I renew my business insurance, lmao.
@gordoooo_z @nicolaromano @Kiloku @eb it's about as good as copy pasting from random githubs and stack overflow posts. You only have a machine to automate that more or less.
By design it can only plagarise the most prevalent code it it's dataset, and a bigger and "smarter" model only increases the quality of that retreiving.
@fcalva @nicolaromano @Kiloku @eb Oh totally; 100% on the same page with you. My point when I said "frightening" wasn't so much concern about AI one day replacing me entirely or whatever. It's frightening in that it's getting good enough that [num w/ ≥2 zeros]s of underqualified "programmers" are "producing" (it's extremely difficult to discuss this subject without copious scare quotes, lol) vast amounts of internet-facing code that they couldn't possibly hope to debug or assess for security holes, and then [in the case of public-facing web apps] potentially exposing any number of future users to that risk.
Is it really so different than anything we've all been exposed to in the big tech social media era? Probably not, though I do expect to see a lot of very embarassing data breaches in the future from services that let users setup custom AI characters for example, or the countless AI image generation services. Not quite AshleyMadison.com, but y'know... potentially not far off, either.
They are writing complete software for you in the same way they sum numbers. Only that you are able to easily spot that the summation gives the wrong result.
@knud
What really baffles me, though, is that apparently MS lets Copilot "calculate" stuff instead of simply generate the Excel formula which is needed here.
@nicolaromano @Kiloku @eb
@reinouts@mastodon.green @knud@mastodon.social @nicolaromano@qoto.org @Kiloku@burnthis.town @eb@social.coop why does this baffle you? An llm doesn’t know what a sum is.
@reinouts @nicolaromano @Kiloku @eb
The explanation for the bafflement is: there is no revenue in LLMs, but there's an outrageous bubble of investment, particularly also by Microsoft. So LLMs get crammed into any product, useful or not, in order to eventually upsell you.
Instead you get an LLM that tries to guess relations in a table - and indeed one can be happy that the answer isn't "Wednesday". But that doesn't make it useful.
@knud
All that may be true, but still. They can and do already write code with LLMs. What is an Excel formula, if not code?
@nicolaromano @Kiloku @eb
@reinouts @knud @nicolaromano @Kiloku @eb useful
@reinouts @nicolaromano @Kiloku @eb
They _don't_ write code. They statistically assemble code sniplets from what they were trained on, given a certain prompt. An LLM has no understanding of "variable" or "function".
And an Excel table is basically a lost cause, because Copilot isn't doing a sum, it's trying to predict an outcome based on training, and there simply weren't many cases where the answer was "15" given that promt. An LLM can't compute a sum.
@knud @nicolaromano @Kiloku @eb I agree that LLMs do not have an actual understanding of what a function is, and that they are ill-suited to perform calculations. However, for instance GPT-4o mini can assemble an Excel formula just fine. This is what I mean:
@reinouts @nicolaromano @Kiloku @eb
Sure, because that's what a gazillion examples described beforehand. But it can't make a sum, because it doesn't know what that is.
It's like you asking me "please draw me an airplane" vs. you asking me "please fly this airplane".
@knud @nicolaromano @Kiloku @eb I never claimed it can. You're trying to convince me of something I already agreed on.
Have a nice day 😀
@reinouts @nicolaromano @Kiloku @eb
Ah, I see what your actual point was: not why Copilot cannot do the work but rather why does MS attempt to let Copilot do the work, instead of either letting Excel do the calculation or at least let Copilot write the Excel formula and then do the calculation by Excel.
That again reverts to "we have to package AI into everything because no-one wants to buy it"...
And: have a nice day as well!
@nicolaromano @Kiloku @eb "If you use a hammer to eat soup you generally don't end up with good results."
Actually a damn good metaphor for using LMM to generate code.
And no, if the method can't solve easy problems, why would you ever trust it to solve hard ones? That's fundamentally not how engineering works.
@PalmAndNeedle @Kiloku @eb Because code is a series of words, which is what LLMs generate. They're not designed to do calculations. 
Indeed, sometimes when asked to calculate something these systems generate code that when executed gives the (mathematically correct, of course) answer. 
The reality is, very often LLMs generate good, working code. The issue is not that. There are big ethical and environmental issues. Also, because the code works often, but not always, you need to double check it at all times, effectively taking longer then writing it by yourself (eg https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/).
@PalmAndNeedle @nicolaromano @Kiloku @eb A partial distinction is helpful:
From an engineering viewpoint current models are very good at predicting the next token, e.g. generating plausible text etc. But task like math Wege not considered in the training of these models. So it's expected they are not very good with it. The classical definition of model has "pragmatic feature" (use case) as a key part. 
There are models that are not to bad even with formal mathemtatics.
@PalmAndNeedle @nicolaromano @Kiloku @eb Question is: Why are MS & Co integrating models into such uses cases, absolutely knowing they are n't suitable for. My guess: The desperate hope to establish AI into workfloss, start making money instead of burning it, and try to fix the real problems later. Could also the be another sign of bubble doing bubble things.
@onterof @PalmAndNeedle @Kiloku @eb I suspect some big investors want to see AI in every software they put money in. Doubt the decision came down to a bunch of programmers...
Also, hype is a funny thing. I've seen students develop suboptimal, very convoluted solutions to fairly simple problems so they could put some transformer or whatnot in there. They're baffled when they don't get full marks...
@onterof @PalmAndNeedle @nicolaromano @Kiloku @eb Yes and there is also software that is *perfect* with formal maths and has been available for decades.
...An entire piece of software that does WHAT, though??
And when will its wrongness surface?
@nicolaromano@qoto.org @Kiloku@burnthis.town @eb@social.coop it’s not bashing of llms, it’s bashing of people who use llms without understanding what they do.
@nicolaromano @Kiloku @eb well I’d really appreciate if Microsoft, Google et al stopped shoving hammers everywhere
@nicolaromano @Kiloku @eb but they’re *specifically* adding Copilot as a big new feature for a *spreadsheet*, so “isn’t a spreadsheet generally something where accurate arithmetic matters?” is a valid critique.
@nicolaromano @Kiloku @eb thing is they can barely write working python scripts and webshit. And that's for the state of the art models...