someone suggested hashing emoji to prevent loading duplicates under different names, but I'd need a persistent place to persist the emoji hashes if I didn't want to recalculate 3,270 emojis every time a new one showed up.

@moonman Don't PNG files have a built-in hash? It's an Alder-32, meaning somebody could easily force a collision with a maliciously-crafted image, but it'd be really quick to grab from the images.

@wizzwizz4 I'll investigate this possibility, thanks. Yeah I'm not worried at all about malicious hashes.

@moonman @wizzwizz4

It won't help as much as you hope.

First, the hashes apply to the compressed output, and the whole raison d'etre for tools like OptiPNG is that there's more than one way to compress the same grid of pixels. This isn't FLAC where the format contains an MD5 hash that allows verifying bit-exact decompressed output.

Second, it's not the overall PNG that's CRC'd, but the chunks. Each chunk consists of a 4-byte data length, a 4-byte type code, the data (can be zero-length), and a 4-byte CRC on the data and the ordering rules for the chunks have no total ordering.

w3.org/TR/PNG/#5DataRep

(The rationale for the design is at w3.org/TR/PNG-Rationale.html#R)

Have you considered calculating and caching a hash on the raw pixel data after loading, and then using a (path, size, mtime) triple as the cache key? That should allow a Good Enough™ check on each emoji using only a single stat call.

@ssokolow @wizzwizz4 in practice the files are bit-for-bit identical because people just copy the png emoji from other servers and change the name.
Follow

@moonman @wizzwizz4 Are you sure about that?

I tend to run tools like OptiPNG, AdvanceCOMP, and jpegoptim over the image files I serve to save bandwidth... oh and `pngcrush -rem gAMA -rem alla -rem cHRM -rem iCCP -rem sRGB -rem time`.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.