On Twitter I had a thread going this year in which I tried to reflect on bugs that I found throughout the year, how to avoid this kind of bug, what can be learned, etc. I will port this idea over to here and see how it goes in the future (I'm still both here and on Twitter, we'll see how that goes).

Recently I fixed a bug in PyPy's time.strftime. It was using some unicode helper function that takes as argument a byte buffer with some utf-8 encoded string, as well as the number of code points. strftime was using this API wrong and passing the number of bytes instead.

foss.heptapod.net/pypy/pypy/-/

After finding the bug we tried to make this API more robust by having a check in the function that counts the codepoints in the byte buffer and complains if that is different from the second argument. This shouldn't be one by default for performance reasons, but it's on during testing.

The reason why the bug got away for so long is that if you test only with ASCII chars it works, because number of bytes == number of codepoints in that case. Lesson: write tests with wider ranges of characters.

Another bug, this time in itertools.tee: tee has an optimization that uses a __copy__ method on the iterator if it has one, instead of carefully using its generic implementation. However, PyPy got it wrong and copied the *iterable* instead of the iterator

foss.heptapod.net/pypy/pypy/-/

This works in simple tests, but in more complicated situations it gives nonsense.

Follow

@cfbolz Jeez this sounds like it could create some very annoying-to-debug situations. 😅

@pganssle I think the fact that it went unnoticed for a long time means that people don't use tee, and if they do, their objects don't have __copy__ 😅

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.