Is there a page where you can paste a string with unknown-to-you Unicode characters and it explains them all?
Stuff like invisible typographical spaces, zero width stuff, emoji joiners, RTL, emoji variant selectors, math symbols used as “fonts,” control characters, etc.? Not just listing stuff, but also explaining them in friendly terms, and maybe adding a little history?
(Because I want to build one if that doesn’t exist. I can see it being useful for debugging, but also education.)
@mwichary
You will love https://decodeunicode.org/
@typographische Oh, it seems broken now but that feels like a promising name!
@typographische Oh, it works now. Yeah. I think I want something string oriented and not as… nerdy.
@mwichary @typographische Mine is even nerdier - it turns a Unicode string into the JavaScript/C/HTML code to produce it. http://acme.com/unicode/decode.html
@mwichary @jef @typographische this is a tool that does this last job for Cyrillic and is being used a lot: https://2cyr.com/
I suspect one needs some knowledge of the intention (in this case a clear indication that it's Cyrillic) to be able to do such a job meaningfully.
Otherwise, what would a "right encoding" mean?
@mapto @jef @typographische Thanks for sharing this! I will check it out.
@mapto @jef @typographische That’s where a string would help – it can establish a lot more context than one character. Either way, feels like the right heuristic good get you far!