robryk: "@sgf@mastodon.xyz @isomer@fosstodon.org How doe…"

@danderson @nyquildotorg Gotta admit, I'm team TCP_NODELAY. Nagle has surprised people for generations, and "But I want to take 20k syscalls to transfer 1MB and have it look efficient" doesn't make me very sympathetic!

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 10:13

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 10:13

Dec 30, 2022, 10:13

@sgf @danderson @nyquildotorg Nagle gets used to "fixup" a bunch of problems (eg silly window syndrome etc).

In general, there are two types of flow: elephants (bandwidth heavy) and mice (latency sensitive). You want nagle for the first class (keep overheads as low as possible for maximum throughput), and not for the second (keep latency low as possible).

**robryk** @robryk@qoto.org · Dec 30, 2022, 12:12

**robryk** @robryk@qoto.org · Dec 30, 2022, 12:12

Dec 30, 2022, 12:12

I would expect that elephants will buffer (and when they temporarily become mice they will either reshuffle things so that the bufferedwriter is out of the picture or they will simply keep flushing the writer at appropriate times). If that's the case then by disabling Nagle's algorithm we're wasting at most one packet each time the buffer is emptied (pessimistically we will emit one one-byte packet then). So, Nagle should be superfluous if we buffer with a buffer that's much larger than a packet or that's chosen to be a multiple of a packet's size. Am I missing some reason Nagle is useful?

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 13:16

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 13:16

Dec 30, 2022, 13:16

@robryk @sgf @danderson @nyquildotorg that's all true, but you can't always keep your buffer full. Eg when reading data from disk. Esp for long fast networks.

Connections can often flip between mice and elephants. It's common to say "do you want this data?" then wait for a reply, then send the entire data. The first part is latency sensitive, the second part is bandwidth heavy.

**robryk** @robryk@qoto.org · Dec 30, 2022, 13:18

**robryk** @robryk@qoto.org · Dec 30, 2022, 13:18

Dec 30, 2022, 13:18

If the buffer is not full, the buffered writer will not write until it gets full. People who deal with buffered writers are used to flushing them at appropriate times (and there are all those funny affordances like stdio's "flush out when someone's reading in").

**Simon Frankau** @sgf@mastodon.xyz · Dec 30, 2022, 13:59

**Simon Frankau** @sgf@mastodon.xyz · Dec 30, 2022, 13:59

Dec 30, 2022, 13:59

@robryk @isomer @danderson @nyquildotorg I've now got the dev party of my brain going "You could argue that Nagle is just a defense against badly-written programs that can't buffer properly", and the SRE part of my brain going "Yes! And we need defenses against badly written programs!".

Can we rename TCP_NODELAY to TCP_TRUSTME?

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 14:30

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 14:30

Dec 30, 2022, 14:30

@sgf @robryk @danderson @nyquildotorg the application doesn't know what the network is doing. It doesn't know if it's slow to respond, or has high bandwidth or whatever.

The kernel doesn't know if the application is currently bandwidth driven or latency driven.

And all these factors change constantly.

The best way is for the application to be explicit about if after sending these bytes it will expect a response in a timely manner or not.

**robryk** @robryk@qoto.org · Dec 30, 2022, 14:31

**robryk** @robryk@qoto.org · Dec 30, 2022, 14:31

Dec 30, 2022, 14:31

> The best way is for the application to be explicit about if after sending these bytes it will expect a response in a timely manner or not.

Precisely. And anyone who uses buffering internally already has to do that, lest the whole thing deadlock.

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 14:36

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 14:36

Dec 30, 2022, 14:36

@robryk @sgf @danderson @nyquildotorg right.

However due to historical reasons, userspace generally doesn't buffer network connections and instead relies on the kernels send buffer. The kernel often has a much better idea of the network performance to let it tune buffer sizes better than userspace (although it often doesn't do a great job there either causing head of line blocking)

**robryk** @robryk@qoto.org · Dec 30, 2022, 14:39

**robryk** @robryk@qoto.org · Dec 30, 2022, 14:39

Dec 30, 2022, 14:39

What were the reasons originally btw?

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 14:49

**Perry Lorier** @isomer@fosstodon.org · Dec 30, 2022, 14:49

Dec 30, 2022, 14:49

@robryk @sgf @danderson @nyquildotorg my guesses:
* Ram was v. expensive, if you needed to buffer in kernel space anyway why have even more buffering.
* Most applications were single threaded and you couldn't tell when to reasonably flush (other than flushing on every write).
* stdio never supported sockets.
* Originally most apps were latency sensitive (eg telnet).
* People were trying to keep a similar API surface for datagrams/streams.

**robryk** @robryk@qoto.org · Dec 30, 2022, 14:52

**robryk** @robryk@qoto.org · Dec 30, 2022, 14:52

Dec 30, 2022, 14:52

> * Most applications were single threaded and you couldn't tell when to reasonably flush (other than flushing on every write).

Wouldn't flushing on every _read_ be the reasonable thing to do?

**Simon Frankau** @sgf@mastodon.xyz · Dec 30, 2022, 15:29

**Simon Frankau** @sgf@mastodon.xyz · Dec 30, 2022, 15:29

Dec 30, 2022, 15:29

@robryk @isomer I imagine there are fun corner cases, such as a) "Is selecting to see if data is available a read or not?" - either an event loop always flushes, or you might end up waiting for a response to data not sent yet, and b) Does it have to be the same socket? - e.g. what happens with FTP control and data connections?

**robryk** @robryk@qoto.org · Dec 30, 2022, 16:42

**robryk** @robryk@qoto.org · Dec 30, 2022, 16:42

Dec 30, 2022, 16:42

@sgf @isomer

If we treat polling that could be interrupted by a read as a read, then (a) is fixed. (b) is obviously a problem, just like the hack that stdio understands that one might expect an input on stdin as a response to output on stdout.

IOW it's a hack, like the stdio thing and arguably like Nagle.

**Simon Frankau** @sgf@mastodon.xyz · Dec 30, 2022, 16:57

**Simon Frankau** @sgf@mastodon.xyz · Dec 30, 2022, 16:57

Dec 30, 2022, 16:57

@robryk @isomer If select counts as a read, a select-based naive implementation of telnet (*) would flush every character typed, thus undermining the original motivation for Nagle!

((*) IIUC early telnet forked subprocesses to handle input and output, but select came in in late '83, while the Nagle RFC was published in '84.)

**robryk** @robryk@qoto.org · 2022-12-30T17:00:03Z

@sgf @isomer

How does the implementation you think of behave when the incoming data arrives more quickly that it's ingested? The way I imagine an implementation that would do what you say it would also unboundedly grow its memory usage by buffering the input then.

Dec 30, 2022, 17:00 · · · ·

**Simon Frankau** @sgf@mastodon.xyz · Dec 30, 2022, 17:17

**Simon Frankau** @sgf@mastodon.xyz · Dec 30, 2022, 17:17

Dec 30, 2022, 17:17

@robryk I think there's some disconnect over assumptions here, because I don't think what I was meaning to write has anything to do with... receive-side buffer sizing?

**robryk** @robryk@qoto.org · Dec 30, 2022, 17:20

**robryk** @robryk@qoto.org · Dec 30, 2022, 17:20

Dec 30, 2022, 17:20