On Twitter I had a thread going this year in which I tried to reflect on bugs that I found throughout the year, how to avoid this kind of bug, what can be learned, etc. I will port this idea over to here and see how it goes in the future (I'm still both here and on Twitter, we'll see how that goes).
Recently I fixed a bug in PyPy's time.strftime. It was using some unicode helper function that takes as argument a byte buffer with some utf-8 encoded string, as well as the number of code points. strftime was using this API wrong and passing the number of bytes instead.
After finding the bug we tried to make this API more robust by having a check in the function that counts the codepoints in the byte buffer and complains if that is different from the second argument. This shouldn't be one by default for performance reasons, but it's on during testing.
The reason why the bug got away for so long is that if you test only with ASCII chars it works, because number of bytes == number of codepoints in that case. Lesson: write tests with wider ranges of characters.
@pganssle I think the fact that it went unnoticed for a long time means that people don't use tee, and if they do, their objects don't have __copy__ 😅