**robryk** @robryk@qoto.org · Dec 10, 2024

**robryk** @robryk@qoto.org · Dec 10, 2024

robryk @robryk@qoto.org

Dec 10, 2024

We have a practical case of #Unicode encoding of country flags causing problems.

To recap, flags are encoded as sequences of codepoints corresponding to letters in the country's ISO code (so, Polish flag is <flag-p> <flag-l>). There is no heed paid to their mutability over time.

The Syrian flag will at some point start being rendered differently. Then, all the previous statements about Assad's government that used the flag will start rendering as if they were about the rebels.

I'm sad at Unicode's failures to fully and immutably encode the meaning of whoever wrote the text (see Han unification for counterexample to "fully").

**Charlotte Eiffel Lilith Buff** @CharlotteBuff@mastodon.social · Dec 10, 2024

**Charlotte Eiffel Lilith Buff** @CharlotteBuff@mastodon.social · Dec 10, 2024

Dec 10, 2024

Charlotte Eiffel Lilith Buff @CharlotteBuff@mastodon.social

@robryk Well, the meaning of regional indicator sequences is “the entity that corresponds to this region code”. Little flag images are just the most common visual presentation for that, and flags are inherently not stable. Nothing much the text encoding can do about that.

**robryk** @robryk@qoto.org · 2024-12-10T22:38:19Z

robryk @robryk@qoto.org

@CharlotteBuff

The thing that I complain about is that I can write a piece of text that, when rendered, has some meaning, and the rendering of that very same byte sequence (e.g. in message logs) later will convey a different meaning.

A text encoding can choose carefully how the symbols that can be encoded are defined, so that they won't be misused (as I suppose you'd call using regional-s regional-y to represent the Assad regime) in ways that will be unambiguous at one point in time and have unambiguously different meaning at another point in time.

December 10, 2024 at 10:38 PM · · · ·

**Charlotte Eiffel Lilith Buff** @CharlotteBuff@mastodon.social · Dec 10, 2024 *

**Charlotte Eiffel Lilith Buff** @CharlotteBuff@mastodon.social · Dec 10, 2024 *

Dec 10, 2024 *

Charlotte Eiffel Lilith Buff @CharlotteBuff@mastodon.social

@robryk If the intended meaning of a text is directly tied to a very specific rendering of that text instead of the underlying semantics (which is what Unicode deals with) then the bytes alone actually hold very little information. I understand that most people think of emoji as images instead of text, and that causes all kinds of UX issues, but the differences between these two types of data are crucial in cases like this.

**Charlotte Eiffel Lilith Buff** @CharlotteBuff@mastodon.social · Dec 10, 2024

**Charlotte Eiffel Lilith Buff** @CharlotteBuff@mastodon.social · Dec 10, 2024

Dec 10, 2024

Charlotte Eiffel Lilith Buff @CharlotteBuff@mastodon.social

@robryk Unicode can’t control what “” looks like just as they can’t control what “A” looks like. If an “A” with a pointy top means something different to you than an “A” with a rounded top then the only way to preserve your intended meaning is to include font data alongside the raw text, which is just infeasible for most purposes.

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…