gh-91576: Speed up iteration of strings#91574
Conversation
6eeeee0 to
0a84504
Compare
|
Happy to help review this, let me know when you're ready |
@JelleZijlstra Finished. |
gvanrossum
left a comment
There was a problem hiding this comment.
Why not use the specialized iteratie for all Latin-1 strings?
That would add one more branch instruction and I was trying to avoid it and LATIN1 is rare compared to ASCII. |
gvanrossum
left a comment
There was a problem hiding this comment.
You should just be able to test
(PyUnicode_KIND((unicode)) == PyUnicode_1BYTE_KIND
to decide which iterator to create, right? Or can kind be changed (once the object is "ready")?
Given that this is a fixed cost (once per iterator construction) I think the extra branch won't be noticeable. Latin-1 may be rare compared to ASCII but it's still got some common characters and it would be essentially free. |
No, the cost is a branch instruction on each iteration as ascii and latin1 uses different structures. |
Hm, couldn't you just store a pointer to the array of bytes (and another to the end) rather than an index? Or is it possible that the bytes move around somehow? |
|
See the LATIN1 macro in unicodeobject.c. |
|
It requires a check if ch is less than 128 then it uses a different array to index depending on the comparison. |
|
How does this affect performance when ascii and non-ascii are mixed together in the same string? |
Oh, I see. That's a bit unfortunate but I see your point and I guess ASCII strings are somewhat special anyways.
In that case the representation of the whole string will not use the "compact ASCII" format and we'll be using the regular (slow) iterator. @kumaraditya303 Please address the other review comments. |
|
Added some tests and addressed comments. |
|
🤖 New build scheduled with the buildbot fleet by @kumaraditya303 for commit ad2d676 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
|
🤖 New build scheduled with the buildbot fleet by @kumaraditya303 for commit 56d110c 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
Benchmark Script:
Results:
Closes #91576