Show newer
Simon boosted

“Today, I have a vision, a vision of superintelligence from experience”

Presented in his humble way, @richardSutton shares his vision of what AI needs
General, experiential, discovers its own abstractions and not bitter🤢
#NeurIPS2025 #NeurIPS

Simon boosted

in the words of Gemini 3: “It is basically a Frankenstein monster combining a CNN (Convolutional Neural Network) and a Transformer, organized like a mammalian brain” 0.5B, SYNTH huggingface.co/mkurman/Neur...

Simon boosted

Four new models from Mistral today - all Apache 2 licensed, all vision-capable, and one of them is a 3GB model that can run in a web browser and answer questions about things it can see through the webcam! simonwillison.net/2025/Dec/2/i

Simon boosted

Hierarchies 😩... One of the biggest recurring time-consuming issues I sometimes encounter is making decisions about _where_ to put some (new or exisiting) code/feature, i.e. in which package, new or existing, considering: functional fit (topic), structural fit (pre-existing data format conventions with the rest of a package), and if possible, not introducing new dependencies as a result of new feature... Sometimes these three aspects are mutually blocking each other and it's so time consuming to figure out a solution...

I've got very similar issues with most other static hierarchies (e.g. directory-based file systems, hierarchical websearch directories etc.) and why I think tag-based systems (with intersection/union/negation ops, not just single categories) are a superior way to organize large collections of knowledge (counting source code here too as a form of encoded knowledge). It's also one of the reasons I've been experimenting with and building tools with completely flat collections/graphs and then use queries & transclusion to assemble/extract/select functionality on demand... Need to prepare some screen recordings to share more of those tools/experiments...

#Hierarchy #Tagging #SoftwareArchitecture

Simon boosted

Why SUV when you can LSV? (Low Speed Vehicles)

25mph max car! street legal in San Francisco.

Can drive on almost all roads in the city.

This one is not that great, imho (but try it at gocar). I want to see more LSVs. Amsterdam has many, and many types.

all roads w/ 35mph limits or less are ok, almost all roads in SF. Here are all limits on all roads in SF:
docs.google.com/spreadsheets/d

from: catalog.data.gov/dataset/speed (go sf!)

Simon boosted

On Monday, @ahl and I will be joined by members of the Oxide team to talk about a doozy: an 18-year-old ZFS data corruption bug that we recently nailed. We'll be at a special Europe-friendly time: 9a Pacific/noon Eastern/5p GMT -- join us for the wild tale!

discord.gg/QrcKGTTPrF?event=14

@bcantrill @ahl just caught up on this one, eagerly awaiting resolution on the DMA issue, was it the nerd Mandela effect or is it real?

@deviantollam never flew as a kid, but used to love thunderstorms!

@timkellogg.me yeah, if the first attempt fails by wiring money to the wrong account, I don't care that it didn't cost as much in compute. They then omit the better agent / models in the table where they only compare accuracy...

Simon boosted

community note: using cost on the y axis makes it appear like cheaper models are more capable on pass@3

Show thread
Simon boosted

Someone sent vee the most amazing study on a pediatric hospital contacting professional racing pit teams and asking them to advise on drafting a handoff procedure for ICU patients of the highest concern between wards.

And Ferrari and Williams went "You all are babies, let us show you how it's done" and cut the error rate in handoffs by like 20% and generally found that you need less training, not more, to do it correctly despite having a faster, more detailed protocol.

This is my shit, so much.

Edit: The link to the article is in a reply to prevent masto-hugging the host but people seem to not be seeing it: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1460-9592.2006.02239.x

Simon boosted

@timkellogg.me is this Nano banana pro? Either it's a bit confused with greek vs latin alphabets, or it's making math / typographical puns

Simon boosted

Evolutionary Algorithms for optimizing LLM weights Gradient descent and backpropagation have a lot of problems, alignment becomes a nightmare. Evolutionary algos fix this, but they don’t scale A recent paper, EGGROLL, makes it computationally feasible to do now www.alphaxiv.org/abs/2511.16652

Simon boosted

There's an amazing new music coming out from European women artists at the moment — this week on the podcast we've got recommendations for Robyn, Oklou, Zaho de Sagazan and Lily Allen. Who should we be adding to our list?

#music #pop #europe #culture #podcast

Simon boosted

Over at the Erdos problem website, AI assistance is now becoming routine. Here is what happened recently regarding Erdos problem #367 erdosproblems.com/367 :

1. On Nov 20, Wouter van Doorn produced a (human-generated) disproof of the second part of this problem, contingent on a congruence identity that he thought was true, and was "sure someoneone here is able to verify... does indeed hold".

2. A few hours later, I posed this problem to Gemini Deepthink, which (after about ten minutes) produced a complete proof of the identity (and confirmed the entire argument): gemini.google.com/share/81a65a . The argument used some p-adic algebraic number theory which was overkill for this problem. I then spent about half an hour converting the proof by hand into a more elementary proof, which I presented on the site. I then remarked that the resulting proof should be within range of "vibe formalizing" in Lean.

3. Two days later, Boris Alexeev used the Aristotle tool from Harmonic to complete the Lean formalization, making sure to formalize the final statement by hand to guard against AI exploits. This process took two to three hours, and the output can be found at borisalexeev.com/t/Erdos367.le

EDIT: after making this post, I decided to round things out by making AI literature searches on this problem, which (after about fifteen minutes) turned up some related literature on consecutive powerful numbers, but nothing directly relating to #367. chatgpt.com/share/6921427d-9dc gemini.google.com/share/0d2964

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.