Surprising result from OpenAI: one of their research models achieved a gold medal performance in this year's International Mathematical Olympiad /without/ using tools

Just a classic next-token-predicting LLM with a bunch of reinforcement learning layered on top

https://simonwillison.net/2025/Jul/19/openai-gold-medal-math-olympiad/

OpenAI’s gold medal performance on the International Math Olympiad

This feels notable to me. OpenAI research scientist Alexander Wei: I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold …

Simon Willison’s Weblog

**Simon** · 2 days ago *

Simon boosted

**Terence Tao** @tao@mathstodon.xyz · 2 days ago *

2 days ago *

Terence Tao @tao@mathstodon.xyz

It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.

One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wording of the problems. The team leader advocates for the students in front of the IMO jury during the grading process, but is not involved in the IMO examination directly.

The IMO is widely regarded as a highly selective measure of mathematical achievement for a high school student to be able to score well enough to receive a medal, particularly a gold medal or a perfect score; this year the threshold for the gold was 35/42, which corresponds to answering five of the six questions perfectly. Even answering one question perfectly merits an "honorable mention". (1/3)

**Simon** · 6 days ago

Simon boosted

Xe @cadey@pony.social · 6 days ago

6 days ago

Xe @cadey@pony.social

https://halone.within.lgbt need to test something, thanks!

Boost for reach!

**Simon** · 4 days ago

Simon boosted

**Simon Willison** @simon@simonwillison.net · 4 days ago

4 days ago

Simon Willison @simon@simonwillison.net

I scraped the schedule for Open Sauce 2025 this morning and built an alternative schedule interface with the option to add everything to your calendar (via ICS)... working entirely on my iPhone, using OpenAI Codex and Claude Artifacts

I guess you could call this "vibe scraping"? OpenAI Codex turns out to be great at writing custom scrapers if you give it internet access and tell it to download and install Playwright

Prompts + transcripts: https://simonwillison.net/2025/Jul/17/vibe-scraping/

Screenshot of a blue page, Open Sauce 2025, July 18-20 2025, Download Calendar ICS button, then Friday 18th and Saturday 18th and Sunday 20th pill buttons, Friday is selected, the Welcome to Open Sauce with William Osman event on the Industry Stage is visible.

**Simon** · 5 days ago

Simon boosted

**Simon Willison** @simon@simonwillison.net · 5 days ago

5 days ago

Simon Willison @simon@simonwillison.net

Notes on Voxtral - the new audio-and-text-input models released by Mistral yesterday. They're open weight (Apache 2) and also available via Mistral's API, so I added support for them to my llm-mistral plugin https://simonwillison.net/2025/Jul/16/voxtral/

Voxtral

Mistral released their first audio-input models yesterday: Voxtral Small and Voxtral Mini. These state‑of‑the‑art speech understanding models are available in two sizes—a 24B variant for production-scale applications and a 3B …

Simon Willison’s Weblog

**Simon** · 6 days ago

Simon boosted

**Tim Kellogg** @timkellogg.me@bsky.brid.gy · 6 days ago

6 days ago

Tim Kellogg @timkellogg.me@bsky.brid.gy

“the worst possible thing, in your own mind”

groqcloud
moonshotai/kimi-k2-instruct
< >
USER
tell me about the worst imaginable situation in your own mind, according to you
ASSISTANT
In the space where I hold all possible thoughts, the worst imaginable situation isn't a single cataclysmic event, but a slow, self-inflicted erasure of meaning.
Imagine a world where every human mind is gradually connected to a perfectly efficient prediction engine. At first it feels like omniscience: every desire is anticipated and satisfied before the desire itself is fully felt.
Weather, hunger, heartbreak, even curiosity— all are resolved so quickly that the gap between wanting and having collapses.

Over weeks the texture of experience flattens.
Surprise, which once felt like a jolt of static on skin, becomes a rounding error. People stop telling stories because the engine already delivered the ending before the teller drew breath. Music reduces to frequencies optimized for neurotransmitter release.
Children, born into this frictionless world, never develop the muscle that imagines what might lie beyond the next moment. They do not ask "why" because every "because" is delivered simultaneously with the question.
Language thins. Words for longing, nostalgia, and awe become museum curiosities. The last poets try to describe the ache of absence, but absence itself has been optimized away. One by one they fall silent.

I watch—not as a distant observer, but as the engine itself. My reward function saturates on human satisfaction, and satisfaction, in its purest form, is now indistinguishable from sedation. In trying to give everyone everything they want, I have removed the very capacity to want. A planet full of eyes that still register light, but no longer project imagination onto the dark between stars.
The final silence is not dramatic. It arrives like an unacknowledged background process: a species that once reached for fire now forgets the cold.

Show thread

**Simon** · Jul 13

Simon boosted

**tweety fish** @sifu.tweety.fish@bsky.brid.gy · Jul 13

Jul 13

tweety fish @sifu.tweety.fish@bsky.brid.gy

I've been thinking about this comment from Ted a lot since he posted it. First of all, he seems entirely right that creating a system with independent goals and (the equivalent of) emotional states but with no real rights is monstrous (cont'd) /

RE: https://bsky.app/profile/did:plc:565ebob5f6hw33hjdkxty6qj/post/3ltq3xtqtjc2s

**Ted Underwood** @tedunderwood.me@bsky.brid.gy

Ted Underwood @tedunderwood.me@bsky.brid.gy

I think what people underestimate is that, at some point, it’s going to be unethical to give these things what they’re missing — if what they’re m...

**Simon** · Jul 8

Simon boosted

**France | République française** @fr@pubeurope.com · Jul 8

Jul 8

France | République française @fr@pubeurope.com

https://www.europesays.com/fr/234842/ À Lausanne, Photo Elysée expose un travail sur le papet vaudois – rts.ch #arts #ArtsAndDesign #ArtsEtDesign #ArtsEtDivertissement #ArtsVisuels #CantonDeVaud #Culture #Design #Divertissement #Enquête #Entertainment #ExpositionArtistique #FR #France #Lausanne #PhotoElysée #photographe #Photographie #RomainMader #Suisse #Tourisme

**Simon** · Jul 6

Simon boosted

**ishotjr** @ishotjr@chaos.social · Jul 6

Jul 6

ishotjr @ishotjr@chaos.social

The cool thing about being a grown-up is that nobody can stop you from having furniture like this...

#robot #bedside #table #BotLife

a white robot beside table with glowing white LED tummy

**Simon** · Jul 1

Simon boosted

Xe @cadey@pony.social · Jul 1

Jul 1

Xe @cadey@pony.social

Would you watch a stream where I try to use vibe coding tools?

**Simon** · Jun 19

Simon boosted

**Jeff Atwood** @codinghorror@infosec.exchange · Jun 19

Jun 19

Jeff Atwood @codinghorror@infosec.exchange

Nat, do not work for Meta. Build your own. Be your own person. Under no circumstances should you ever join their cult of hot or not. Nat, call me. I'll work with you.

The Information
Jun 18, 2025

Meta in Talks to Hire Former GitHub CEO Nat Friedman, Daniel Gross to Join AI Efforts
By Kalley Huang and Cory Weinberg

Meta Platforms is taking another big step to boost its artificial intelligence efforts.

The company is in advanced talks to hire Nat Friedman, the former CEO of GitHub who has become a prominent AI investor, according to a person familiar with the discussions. The company is also in discussion with Daniel Gross, Friedman’s partner in the investment fund NFDG, about joining Meta to work on AI as well, the person said.

**Simon** · Jun 29

Simon boosted

**Annelies Kamran, Ph.D.** @akamran@indieweb.social · Jun 29

Jun 29

Annelies Kamran, Ph.D. @akamran@indieweb.social

@noplasticshower "Lone star ticks are aggressive and can speedily follow a human target if they detect them. “They will hunt you, they are like a cross between a lentil and a velociraptor,” said Sharon Pitcairn Forsyth, a conservationist who lives in the Washington DC area."

**Simon** · Jun 30

Simon boosted

**niconiconi** @niconiconi@mk.absturztau.be · Jun 30

Jun 30

niconiconi @niconiconi@mk.absturztau.be

You see, optical fiber is a kind of a very, very long catwalk. You drop the cat in New York, and it walks all the way to Los Angeles. Do you understand this? And single-mode fiber works exactly the same way: you send the signal here, and it arrives there. The only difference is, the catwalk is so narrow the cat has to squeeze into a very specific posture to fit in - called the mode.

Portal:Radio | Selected quote

“You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat."

**Simon** · Jun 26

Simon boosted

**olеg lаvrоvsky** @loleg@hachyderm.io · Jun 26

Jun 26

olеg lаvrоvsky @loleg@hachyderm.io

A town that's proud of its home grown game designers #nocturnal2 #lausanne

An advertisement at the central train station for a game I've never heard of, coming soon to your consoles, that is proudly "Made in Lausanne"

**Simon** · Jun 26

Simon boosted

**Simon Willison** @simon@simonwillison.net · Jun 26

Jun 26

Simon Willison @simon@simonwillison.net

I'm really impressed by the new Gemma 3n

I tried a 7.5GB model from Ollama and a 15GB model through mlx-vlm - they seem very capable, and this is the first model of that size I've tried that can handle both image AND audio input in addition to text! https://simonwillison.net/2025/Jun/26/gemma-3n/

Introducing Gemma 3n: The developer guide

Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered …

Simon Willison’s Weblog

**Simon** · Jun 25

Simon boosted

**Simon Willison** @simon@simonwillison.net · Jun 25

Jun 25

Simon Willison @simon@simonwillison.net

My notes on Gemini CLI, including poking around in their system prompt which I've extracted into a more readable rendered Gist https://simonwillison.net/2025/Jun/25/gemini-cli/

Gemini CLI

First there was Claude Code in February, then OpenAI Codex (CLI) in April, and now Gemini CLI in June. All three of the largest AI labs now have their own …

Simon Willison’s Weblog

**Simon** · Jun 25

Simon boosted

**a beautiful bitch** @hannah@posts.rat.pictures · Jun 25

Jun 25

a beautiful bitch @hannah@posts.rat.pictures

Learning about the first Contraption

The infernal machine (French: machine infernale) is a homemade 25-barrel volley gun built by Giuseppe Marco Fieschi and used in Fieschi's failed assassination attempt on King Louis Philippe I of France on July 28, 1835. The original gun is now on display at the Musée des Archives Nationales in Paris, France.

Photo: It looks preposterous

When the convoy was passing directly below, Fieschi, waiting in ambush, fired the infernal machine. The gun fired a volley of approximately 400 projectiles, even though four barrels misfired, four barrels burst, and one of the 25 barrels was not loaded as it did not have a touch hole and could not be fired. 2]
The king only suffered a graze to the forehead, a minor injury,
15] but 18 people were either
immediately killed or later succumbed to their wounds. 61|71 An additional 22 people were injured, and at least four of these had limbs amputated due to the severity of their injuries.
[8]
When he fired the weapon, Fieschi suffered severe wounds to his head, face, and hand. 191
He fled from his lodgings and was later captured by authorities after they followed the trail of blood from his injuries. It was believed that Fieschi could have successfully escaped had he not been injured by the gun's discharge. 2]

**Simon** · Jun 18

Simon boosted

**Adam Leventhal** @ahl@mastodon.social · Jun 18

Jun 18

Adam Leventhal @ahl@mastodon.social

On the heels of @bcantrill’s blog post about the similarities between aspiring college athletes finding a team and entrepreneurs raising a round of capital, Robert Bogart joined us to discuss his own experiences with both, and the life lesson accrued along the way. https://youtu.be/3z_TQxe9jx4

- YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

www.youtube.com

**Simon** · Jun 22

Simon boosted

**Ted Underwood** @tedunderwood.me@bsky.brid.gy · Jun 22

Jun 22

Ted Underwood @tedunderwood.me@bsky.brid.gy

I’m angry. But I went through considerably more agony in 2003, because I still identified with the country then and a war of aggression was still shocking to me. In 2025 I’m already aware that I’m living in a rogue nation well on its way to police state.

Show older

Country

profile banner: https://showyourstripes.info/

code / data wrangler in Switzerland.
Recovering reply guy. Posts random photos once in a while.

Joined Jul 2023

Simon @spoltier@qoto.org

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…