Surprising result from OpenAI: one of their research models achieved a gold medal performance in this year's International Mathematical Olympiad /without/ using tools
Just a classic next-token-predicting LLM with a bunch of reinforcement learning layered on top
https://simonwillison.net/2025/Jul/19/openai-gold-medal-math-olympiad/
It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.
One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wording of the problems. The team leader advocates for the students in front of the IMO jury during the grading process, but is not involved in the IMO examination directly.
The IMO is widely regarded as a highly selective measure of mathematical achievement for a high school student to be able to score well enough to receive a medal, particularly a gold medal or a perfect score; this year the threshold for the gold was 35/42, which corresponds to answering five of the six questions perfectly. Even answering one question perfectly merits an "honorable mention". (1/3)
https://halone.within.lgbt need to test something, thanks!
Boost for reach!
I scraped the schedule for Open Sauce 2025 this morning and built an alternative schedule interface with the option to add everything to your calendar (via ICS)... working entirely on my iPhone, using OpenAI Codex and Claude Artifacts
I guess you could call this "vibe scraping"? OpenAI Codex turns out to be great at writing custom scrapers if you give it internet access and tell it to download and install Playwright
Prompts + transcripts: https://simonwillison.net/2025/Jul/17/vibe-scraping/
Notes on Voxtral - the new audio-and-text-input models released by Mistral yesterday. They're open weight (Apache 2) and also available via Mistral's API, so I added support for them to my llm-mistral plugin https://simonwillison.net/2025/Jul/16/voxtral/
I've been thinking about this comment from Ted a lot since he posted it. First of all, he seems entirely right that creating a system with independent goals and (the equivalent of) emotional states but with no real rights is monstrous (cont'd) /
RE: https://bsky.app/profile/did:plc:565ebob5f6hw33hjdkxty6qj/post/3ltq3xtqtjc2s
https://www.europesays.com/fr/234842/ À Lausanne, Photo Elysée expose un travail sur le papet vaudois – rts.ch #arts #ArtsAndDesign #ArtsEtDesign #ArtsEtDivertissement #ArtsVisuels #CantonDeVaud #Culture #Design #Divertissement #Enquête #Entertainment #ExpositionArtistique #FR #France #Lausanne #PhotoElysée #photographe #Photographie #RomainMader #Suisse #Tourisme
@noplasticshower "Lone star ticks are aggressive and can speedily follow a human target if they detect them. “They will hunt you, they are like a cross between a lentil and a velociraptor,” said Sharon Pitcairn Forsyth, a conservationist who lives in the Washington DC area."
You see, optical fiber is a kind of a very, very long catwalk. You drop the cat in New York, and it walks all the way to Los Angeles. Do you understand this? And single-mode fiber works exactly the same way: you send the signal here, and it arrives there. The only difference is, the catwalk is so narrow the cat has to squeeze into a very specific posture to fit in - called the mode.
I'm really impressed by the new Gemma 3n
I tried a 7.5GB model from Ollama and a 15GB model through mlx-vlm - they seem very capable, and this is the first model of that size I've tried that can handle both image AND audio input in addition to text! https://simonwillison.net/2025/Jun/26/gemma-3n/
My notes on Gemini CLI, including poking around in their system prompt which I've extracted into a more readable rendered Gist https://simonwillison.net/2025/Jun/25/gemini-cli/
On the heels of @bcantrill’s blog post about the similarities between aspiring college athletes finding a team and entrepreneurs raising a round of capital, Robert Bogart joined us to discuss his own experiences with both, and the life lesson accrued along the way. https://youtu.be/3z_TQxe9jx4
code / data wrangler in Switzerland.
Recovering reply guy. Posts random photos once in a while.