gh-95382: Use cache for indentations in the JSON encoder#118636
gh-95382: Use cache for indentations in the JSON encoder#118636serhiy-storchaka merged 8 commits intopython:mainfrom
Conversation
8e82b42 to
2e915fd
Compare
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
| if (self->indent != Py_None) { | ||
| newline_indent = _create_newline_indent(self->indent, indent_level); | ||
| if (newline_indent == NULL) { | ||
| indent_cache = create_indent_cache(self, indent_level); |
There was a problem hiding this comment.
When called via the public interface indent_level is always zero. By removing the argument indent_level in the internal implementation this call and some other bits of the code can be a bit simplified.
(this is probably not worth creating a separate PR for, I leave it up to you)
| write_newline_indent(_PyUnicodeWriter *writer, | ||
| Py_ssize_t indent_level, PyObject *indent_cache) | ||
| { | ||
| PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2); |
There was a problem hiding this comment.
| PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2); | |
| assert(indent_level * 2 < PyList_GET_SIZE(indent_cache)); | |
| PyObject *newline_indent = PyList_GET_ITEM(indent_cache, indent_level * 2); |
There was a problem hiding this comment.
The assert above is placed after conditional resizing the cache. It is useful to check that we always get the right cache size, independently from conditions. But here we do not change the cache.
| return NULL; | ||
| } | ||
| } | ||
| assert(indent_level * 2 < PyList_GET_SIZE(indent_cache)); |
There was a problem hiding this comment.
| assert(indent_level * 2 < PyList_GET_SIZE(indent_cache)); | |
| assert(indent_level * 2 - 1 < PyList_GET_SIZE(indent_cache)); |
There was a problem hiding this comment.
indent_level * 2 < PyList_GET_SIZE(indent_cache) is more strong condition.
eendebakpt
left a comment
There was a problem hiding this comment.
The PR adds some extra code for creation and usage of the cache, but this simplifies the error handling in the methods using the cache. The performance improvement is a bit larger than I expected which is nice, although it is also fair to say that the performance when writing json files with indent != None is probably not the most important.
It is a continuation of #118105 which keeps a cache of newline+indentation and comma+newline+indentation at every level.
For example:
The effect is the strongest if there are many deep narrow trees growing from the root:
And the weakest (up to no difference) if the tree contains few large flat lists or dicts.