GH-106485: Dematerialize instance dictionaries when possible#106539
GH-106485: Dematerialize instance dictionaries when possible#106539brandtbucher merged 22 commits intopython:mainfrom
Conversation
markshannon
left a comment
There was a problem hiding this comment.
I think this does the right thing, and does it the right way.
But I don't know if it does it at the right time, or in the right place.
In other words, are we dematerializing in the right instruction, and in such as way as to not degrade specialization?
I guess the measure of that is the stats.
Python/bytecodes.c
Outdated
| DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR); | ||
| res = _PyDictOrValues_GetValues(dorv)->values[index]; | ||
| PyDictOrValues *dorv = _PyObject_DictOrValuesPointer(owner); | ||
| if (!_PyDictOrValues_IsValues(*dorv)) { |
There was a problem hiding this comment.
I don't think we want this big chunk of code inline here, it's going to mess up the tier 2 optimizer (it won't help the compiler generate code for the tier 1 interpreter either).
How about DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR_INSTANCE_DEMATERIALIZE)
and make another instruction LOAD_ATTR_INSTANCE_DEMATERIALIZE to do the dematerialization?
Python/specialize.c
Outdated
| if (_PyDictOrValues_IsValues(dorv)) { | ||
| // Virtual dictionary | ||
| unmaterializing: | ||
| ; // Load-bearing semicolon; don't touch! |
There was a problem hiding this comment.
Maybe // C doesn't allow declarations immediately after a label?
| return 0; | ||
| } | ||
| // We found an instance with a __dict__. | ||
| if (dict->ma_values) { |
There was a problem hiding this comment.
This is quite a complex condition. The goto makes it even harder to follow.
Could you compute a virtual_dict flag, then branch on that?
|
The stats still show "has managed dict" as the primary reason for specialization failure. That suggests to me that we need to dematerialize in If that is the case we should be able to get the specialization rate for |
__dict__s in LOAD_ATTR_INSTANCE_VALUE|
Just kicked off a stats/benchmarking job for the new approach. Should be done soon. |
|
Stats comparison vs base and benchmark comparison vs base. On my phone, so not able to dig into the results right now. |
markshannon
left a comment
There was a problem hiding this comment.
The stats are puzzling.
This hugely improves the specialization success rate, but make almost no difference to the number of unspecialized LOAD_ATTRs executed.
| assert(Py_TYPE(owner)->tp_flags & Py_TPFLAGS_MANAGED_DICT); | ||
| PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(owner); | ||
| DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR); | ||
| PyDictOrValues *dorv = _PyObject_DictOrValuesPointer(owner); |
There was a problem hiding this comment.
This is turning:
test_bit = owner[-4] & 1
jump if test_bit -> slow_path
into
test_bit = owner[-4] & 1
jump if not test_bit -> following
spill-registers
call _PyObject_MakeInstanceAttributesFromDict
restore-registers
jump if return-register ->following
jump -> slow_path
following:
or, which is even worse:
test_bit = owner[-4] & 1
spill-registers
jump if not test_bit -> following
call _PyObject_MakeInstanceAttributesFromDict
jump if return-register ->following
jump -> slow_path
following:
restore-registers
I think this will need to be
DEOPT_IF(!_PyDictOrValues_IsValues(*dorv), LOAD_ATTR_DEMATERIALIZE)To keep the slow-path from messing up the code.
There was a problem hiding this comment.
This really complicates the code (and probably needs ugly special-casing in the resulting uop trace), since LOAD_ATTR_DEMATERIALIZE would need to jump back into (or repeat) the instruction we happen to be executing after a successful dematerialization.
Can we leave that to another PR, if we determine that the compiler indeed hasn't just moved this cold branch with the register spills out-of-line (like it should, under PGO)?
test_bit = owner[-4] & 1
jump if test_bit -> cold
following:
...
cold:
spill-registers
call _PyObject_MakeInstanceAttributesFromDict
restore-registers
jump if return-register ->following
jump -> slow_path
Python/bytecodes.c
Outdated
| assert(self_cls->tp_flags & Py_TPFLAGS_MANAGED_DICT); | ||
| PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(self); | ||
| DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR); | ||
| PyDictOrValues *dorv = _PyObject_DictOrValuesPointer(self); |
There was a problem hiding this comment.
Same comment as for LOAD_ATTR_INSTANCE_VALUES
Python/bytecodes.c
Outdated
| assert(self_cls->tp_flags & Py_TPFLAGS_MANAGED_DICT); | ||
| PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(self); | ||
| DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR); | ||
| PyDictOrValues *dorv = _PyObject_DictOrValuesPointer(self); |
We're only dematerializing about 5M dicts, and the unspecialized As a side note, I've learned to mostly ignore the specialization success/failure stats when comparing branches like this, since repeated specialization attempts (and exponential backoff) cause failure rates to grow or shrink disproportionately from the actual number of specialized sites. In this case, the number of successes drops by 236k and the number of failures drop by over a million. But the percentages say we actually have 5% more successes now. It's sort of confusing to derive much meaning from those sorts of numbers. 🙃 |
|
Confirmed that the weird |
| if (_PyType_HasFeature(tp, Py_TPFLAGS_MANAGED_DICT)) { | ||
| OBJECT_STAT_INC(dict_materialized_on_request); | ||
| } |
There was a problem hiding this comment.
@markshannon BTW, this was the missing STAT_INC. Anything that sets a custom tp_new and has Py_TPFLAGS_MANAGED_DICT doesn't get its inline values initialized, and has a NULL dict. We handle it fine, but we may want to consider treating a NULL dict as "has values that need initializing" rather than "has a dict that needs initializing".
There was a problem hiding this comment.
Turns out, this is a major source of dict materializations in the benchmarks (about ~4 million dicts that we weren't counting before). It looks like we're creating all of these just to dematerialize them almost immediately.
| } | ||
| // It's likely that this dict still shares its keys (if it was materialized | ||
| // on request and not heavily modified): | ||
| assert(PyDict_CheckExact(dict)); |
There was a problem hiding this comment.
Maybe this should allow dict subclasses?
Here's an alternative to #106496, using
LOAD_ATTR_INSTANCE_VALUE.__dict__s #106485