It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.

One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wording of the problems. The team leader advocates for the students in front of the IMO jury during the grading process, but is not involved in the IMO examination directly.

The IMO is widely regarded as a highly selective measure of mathematical achievement for a high school student to be able to score well enough to receive a medal, particularly a gold medal or a perfect score; this year the threshold for the gold was 35/42, which corresponds to answering five of the six questions perfectly. Even answering one question perfectly merits an "honorable mention". (1/3)

**sojournTime** · Jul 19, 2025, 18:55 *

sojournTime boosted

**Terence Tao** @tao@mathstodon.xyz · Jul 19, 2025, 18:55 *

Jul 19, 2025, 18:55 *

Terence Tao @tao@mathstodon.xyz

But consider what happens to the difficulty level of the Olympiad if we alter the format in various ways, such as the following:

* One gives the students several days to complete each question, rather than four and half hours for three questions. (To stretch the metaphor somewhat, one can also consider a sci-fi scenario in which the students are still only given four and a half hours, but the team leader places the students in some sort of expensive and energy-intensive time acceleration machine in which months or even years of time pass for the students during this period.)
* Before the exam starts, the team leader rewrites the questions in a format that the students find easier to work with.
* The team leader gives the students unlimited access to calculators, computer algebra packages, formal proof assistants, textbooks, or the ability to search the internet.
* The team leader has the six student team work on the same problem simultaneously, communicating with each other on their partial progress and reported dead ends.
* The team leader gives the students prompts in the direction of favorable approaches, and intervenes if one of the students is spending too much time on a direction that they know to be unlikely to succeed.
* Each of the six students on the team submit solutions to the team leader, who then selects only the "best" solution for each question to submit to the competition, discarding the rest.
* If none of the students on the team obtains a satisfactory solution, the team leader does not submit any solution at all, and silently withdraws from the competition without their participation ever being noted. (2/3)

Show thread

**sojournTime** · Jul 19, 2025, 18:56 *

sojournTime boosted

**Terence Tao** @tao@mathstodon.xyz · Jul 19, 2025, 18:56 *

Jul 19, 2025, 18:56 *

Terence Tao @tao@mathstodon.xyz

In each of these formats, the submitted solutions are still technically generated by the high school contestants, rather than the team leader. However, the reported success rate of the students on the competition can be dramatically affected by such changes of format; a student or team of students who might not even always reach bronze medal performance if taking the competition under standard test conditions might instead reach reliable gold medal performance under some of the modified formats indicated above.

So, in the absence of a controlled test methodology that was not self-selected by the competing teams, one should be wary of making overly simplistic apples-to-apples comparisons between the performance of various AI models on competitions such as the IMO, or between such models and the human contestants.

Related to this, I will not be commenting on any self-reported AI competition performance results for which the methodology was not disclosed in advance of the competition. EDIT: In particular, the above comments are not specific to any single result of this nature.
(3/3)

Show thread

**sojournTime** · Jun 11, 2025, 23:35

sojournTime boosted

**404 Media** @404mediaco@mastodon.social · Jun 11, 2025, 23:35

Jun 11, 2025, 23:35

404 Media @404mediaco@mastodon.social

Wikipedia Pauses AI-Generated Summaries After Editor Backlash

🔗 https://www.404media.co/wikipedia-pauses-ai-generated-summaries-after-editor-backlash/

**sojournTime** · May 29, 2025, 06:41

sojournTime boosted

**David Ho** @davidho@mastodon.world · May 29, 2025, 06:41

May 29, 2025, 06:41

David Ho @davidho@mastodon.world

No words.

"…U.S. State Department will work with the Department of Homeland Security to aggressively revoke visas for Chinese students…"  https://www.state.gov/releases/office-of-the-spokesperson/2025/05/new-visa-policies-put-america-first-not-china/

**sojournTime** · May 21, 2025, 06:05

sojournTime boosted

**JA Westenberg** @Daojoan@mastodon.social · May 21, 2025, 06:05

May 21, 2025, 06:05

JA Westenberg @Daojoan@mastodon.social

Curiosity has a natural predator.

It’s called optimization.
https://www.joanwestenberg.com/how-convenience-kills-curiosity/

**sojournTime** · May 18, 2025, 02:32

sojournTime boosted

**James Scholes** @jscholes@dragonscave.space · May 18, 2025, 02:32

May 18, 2025, 02:32

James Scholes @jscholes@dragonscave.space

Thanks @masonasons for the tip that yt-dlp can download from RSS feeds. Here's the command I came up with to download all available podcast items from a feed in chronological order (oldest first) to nicely numbered files with the title and date.

yt-dlp --no-abort-on-error --color "no_color" --download-archive ".download_history" --windows-filenames --embed-metadata --embed-chapters --playlist-items "::-1" --output "%(n_entries+1-playlist_index)02d %(title)s (%(upload_date>%B %d %Y)s).%(ext)s" --format bestaudio "https://example.com/rss"

Show thread

**sojournTime** · May 18, 2025, 06:44

sojournTime boosted

**JA Westenberg** @Daojoan@mastodon.social · May 18, 2025, 06:44

May 18, 2025, 06:44

JA Westenberg @Daojoan@mastodon.social

Email is the cockroach of the internet - it outlives every wave trying to kill it. Forget Slack, forget Discord, forget chat apps. Email is universal, decentralized, and asynchronous. It's not sexy, but it's the ultimate survivor.

**sojournTime** · Apr 26, 2025, 19:38 *

sojournTime boosted

**Neil Kandalgaonkar** @neilk@xoxo.zone · Apr 26, 2025, 19:38 *

Apr 26, 2025, 19:38 *

Neil Kandalgaonkar @neilk@xoxo.zone

It's certainly possible that a new knowledge-sharing paradigm could eventually bloom, one that's native to the properties of a distributed network.

But if you want to preserve the value of Wikipedia _today_, its connection to audiences _today_, you're not going to win by dodging it with clever tech.

You have to actually fight this.

Show thread

**sojournTime** · Apr 26, 2025, 19:36 *

sojournTime boosted

**Neil Kandalgaonkar** @neilk@xoxo.zone · Apr 26, 2025, 19:36 *

Apr 26, 2025, 19:36 *

Neil Kandalgaonkar @neilk@xoxo.zone

Many nerds dream about less-censorable distributed tech, and think a great event like this will finally make their dream relevant. Move Wikipedia over and the audience will switch!

The audience will not switch. Distributed networks with no chokepoints are possible, but are always inconvenient or insecure. The audience was already finding it more convenient to chat with AIs.

The audience may not even be allowed to switch! The government can easily influence device manufacturers.

Show thread

**sojournTime** · Apr 26, 2025, 19:09

sojournTime boosted

**Neil Kandalgaonkar** @neilk@xoxo.zone · Apr 26, 2025, 19:09

Apr 26, 2025, 19:09

Neil Kandalgaonkar @neilk@xoxo.zone

Beloved programming community: many of you are hearing about the US DoJ threatening Wikipedia.

Some of you are thinking of ways to thwart this. Download the Wikipedia dumps, put it on IPFS or hand-couriered USB drives or other less-censorable systems.

A good impulse, but missing the point.

Wikipedia is not just a big document or a software artifact.

Its true value is that it is effortlessly available to a wide audience, can be updated rapidly, with no preconditions to view or edit.

**sojournTime** · Apr 22, 2025, 21:37

sojournTime boosted

**GeekMomProjects** @geekmomprojects@mastodon.social · Apr 22, 2025, 21:37

Apr 22, 2025, 21:37

GeekMomProjects @geekmomprojects@mastodon.social

The diffusers on the new LED polyhedron (dihedral hexacontahedron, right) looked so much better than the first build that I had to print new ones for the truncated icosahedron (left). I also synced up the patterns on both builds (with #pixelblaze) and will be sitting here staring at them for the rest of the day.

5d77c9fbd1ce80c2.mp4

**sojournTime** · Apr 19, 2025, 21:50

sojournTime boosted

**Tattie** @Tattie@eldritch.cafe · Apr 19, 2025, 21:50

Apr 19, 2025, 21:50

Tattie @Tattie@eldritch.cafe

Which one of you Fedi weirdos* was at the Edinburgh trans rights demo holding the sign saying:

private String gender
not
public const bool gender

* affectionate

**sojournTime** · Mar 27, 2025, 08:49

sojournTime boosted

**Mark McCaughrean** @markmccaughrean@mastodon.social · Mar 27, 2025, 08:49

Mar 27, 2025, 08:49

Mark McCaughrean @markmccaughrean@mastodon.social

Wow – didn’t think I’d be in tears today, but this message sent home from Gaia as it was shut down forever today hits hard 😭

What you’re seeing is a map of the 106 CCD detectors that Gaia used to measure the positions of billions of stars in the Milky Way for the past 11 years 🛰️✨

They were turned off in a special sequence … 😕

#SpaceScience #Astronomy #Science

8686a5b242697f16.jpeg

**sojournTime** · Mar 25, 2025, 10:32

sojournTime boosted

**Ted Pavlic (he/him)** @tedpavlic@mas.to · Mar 25, 2025, 10:32

Mar 25, 2025, 10:32

Ted Pavlic (he/him) @tedpavlic@mas.to

Yeah, it can happen to pilots too!

ABC7 Los Angeles: "United Airlines flight from LAX bound for China turns around after pilot forgets passport"
https://abc7.com/post/united-airlines-flight-bound-china-turns-around-pilot-forgets-passport/16077746/

**sojournTime** · Mar 31, 2025, 17:07

sojournTime boosted

**Morgunin** @Morgunin@rollenspiel.social · Mar 31, 2025, 17:07

Mar 31, 2025, 17:07

Morgunin @Morgunin@rollenspiel.social

Friendly reminder.

#Aprilfools #Aprilscherz #aprilfoolsday

61a00463041f5674.png

**sojournTime** · Mar 28, 2025, 21:32

sojournTime boosted

**zen3ger** @zen3ger@fosstodon.org · Mar 28, 2025, 21:32

Mar 28, 2025, 21:32

zen3ger @zen3ger@fosstodon.org

About a year ago, my parents made the switch to Linux on their home machines because they really hated Win 10... Today I got a call from my mother to help her out with something, but I did not expect that "something" will be figuring out a sed pattern for a shell script she wrote to bulk rename files.

When I asked her why she didn't use some GUI program she said "I was an accountant in the DOS era, this makes more sense to me than a ribbon menu in Excel".