Optimize fast_locals and atomic ordering#7289
Conversation
- inc/inc_by/get: SeqCst → Relaxed - safe_inc CAS: SeqCst → Relaxed + compare_exchange_weak - dec: SeqCst → Release + Acquire fence when count drops to 0 - leak CAS: SeqCst → AcqRel/Relaxed + compare_exchange_weak
Replace vec![self_val] + extend(args.args) with FuncArgs::prepend_arg() to avoid a second heap allocation on every method call.
Early return in PyCallable::invoke() when use_tracing is false, avoiding two downcast_ref type checks on every function call.
📝 WalkthroughWalkthroughThis PR introduces lock-free frame locals storage via a new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin worktree-perf-step1 |
Eliminate per-instruction mutex lock/unlock overhead for local variable access. FastLocals uses UnsafeCell with safety guaranteed by the frame's state mutex and sequential same-thread execution. Affects 14+ lock() call sites in hot instruction paths (LoadFast, StoreFast, DeleteFast, and their paired variants).
2a5b8ce to
958853f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/common/src/refcount.rs (1)
124-137:⚠️ Potential issue | 🟠 MajorUse
Acquireon successfulsafe_incCAS for weak→strong upgrade synchronization.The
safe_incmethod is used in weak reference upgrade paths (e.g.,PyWeak::upgradeat line 688 incrates/vm/src/object/core.rs). Currently, line 136 usesOrdering::Relaxedon successful CAS, which provides no synchronization guarantees.When upgrading from weak to strong, the CAS success must acquire visibility of memory written by previous strong reference holders. The
dec()method usesOrdering::Release(line 149), establishing a release-acquire synchronization point. The successfulsafe_incmust useOrdering::Acquireto properly pair with this Release and ensure visibility of modifications made under the previous strong reference.Proposed patch
match self.state.compare_exchange_weak( old.as_raw(), new_state.as_raw(), - Ordering::Relaxed, + Ordering::Acquire, Ordering::Relaxed, ) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/common/src/refcount.rs` around lines 124 - 137, The CAS in safe_inc (the compare_exchange_weak call that uses old.as_raw/new_state.as_raw) currently uses Ordering::Relaxed for the success case; change the success ordering to Ordering::Acquire while keeping the failure ordering relaxed, so the weak→strong upgrade acquires the release from dec() and sees prior writes. Locate safe_inc, the State::from_raw / add_strong usage and adjust the compare_exchange_weak success ordering to Acquire.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/vm/src/frame.rs`:
- Around line 96-133: The FastLocals UnsafeCell can be accessed concurrently via
the Python-exposed f_locals() path (which calls locals()), violating the
assumption that access is serialized by with_exec(); to fix, either (A) remove
the unsafe impls unsafe impl Send/Sync for FastLocals and add runtime assertions
to prevent cross-thread access, or (B) gate all fastlocals borrows behind the
frame execution guard/token (have f_locals() acquire the same with_exec() guard
or a borrow token before calling locals()/FastLocals::borrow/borrow_mut), or (C)
add a runtime thread-check that panics on unsynchronized cross-thread access;
update the FastLocals docs to reflect the chosen model and adjust callers
(f_locals, locals, and any direct FastLocals usage) to obtain the guard/token or
to compile without Send/Sync accordingly.
---
Outside diff comments:
In `@crates/common/src/refcount.rs`:
- Around line 124-137: The CAS in safe_inc (the compare_exchange_weak call that
uses old.as_raw/new_state.as_raw) currently uses Ordering::Relaxed for the
success case; change the success ordering to Ordering::Acquire while keeping the
failure ordering relaxed, so the weak→strong upgrade acquires the release from
dec() and sees prior writes. Locate safe_inc, the State::from_raw / add_strong
usage and adjust the compare_exchange_weak success ordering to Acquire.
ℹ️ Review info
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
crates/common/src/refcount.rscrates/vm/src/builtins/frame.rscrates/vm/src/builtins/function.rscrates/vm/src/builtins/super.rscrates/vm/src/frame.rscrates/vm/src/protocol/callable.rs
| /// Lock-free storage for local variables (localsplus). | ||
| /// | ||
| /// # Safety | ||
| /// Access is serialized by the frame's state mutex in `with_exec()`, which | ||
| /// prevents concurrent frame execution. Trace callbacks that access `f_locals` | ||
| /// run sequentially on the same thread as instruction execution. | ||
| pub struct FastLocals { | ||
| inner: UnsafeCell<Box<[Option<PyObjectRef>]>>, | ||
| } | ||
|
|
||
| // SAFETY: Frame execution is serialized by the state mutex. | ||
| #[cfg(feature = "threading")] | ||
| unsafe impl Send for FastLocals {} | ||
| #[cfg(feature = "threading")] | ||
| unsafe impl Sync for FastLocals {} | ||
|
|
||
| impl FastLocals { | ||
| fn new(data: Box<[Option<PyObjectRef>]>) -> Self { | ||
| Self { | ||
| inner: UnsafeCell::new(data), | ||
| } | ||
| } | ||
|
|
||
| /// # Safety | ||
| /// Caller must ensure exclusive access (frame state locked or frame | ||
| /// not executing). | ||
| #[inline(always)] | ||
| pub unsafe fn borrow(&self) -> &[Option<PyObjectRef>] { | ||
| unsafe { &*self.inner.get() } | ||
| } | ||
|
|
||
| /// # Safety | ||
| /// Caller must ensure exclusive mutable access. | ||
| #[inline(always)] | ||
| #[allow(clippy::mut_from_ref)] | ||
| pub unsafe fn borrow_mut(&self) -> &mut [Option<PyObjectRef>] { | ||
| unsafe { &mut *self.inner.get() } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "1) Enumerate all fastlocals unsafe borrow call sites"
rg -n 'fastlocals\.borrow(_mut)?\(' \
crates/vm/src/frame.rs \
crates/vm/src/builtins/frame.rs \
crates/vm/src/builtins/function.rs \
crates/vm/src/builtins/super.rs
echo
echo "2) Inspect synchronization context around main externally reachable paths"
rg -n -A25 -B8 'fn locals\(|fn clear\(|fn f_locals\(|fn with_exec\(' \
crates/vm/src/frame.rs \
crates/vm/src/builtins/frame.rs
echo
echo "3) Inspect owner synchronization primitives (load/store vs CAS)"
rg -n 'owner\.(load|store|compare_exchange|compare_exchange_weak)' \
crates/vm/src/frame.rs \
crates/vm/src/builtins/frame.rsRepository: RustPython/RustPython
Length of output: 9197
FastLocals synchronization contract is violated by externally reachable paths; f_locals() getter permits concurrent unsynchronized access.
The doc comment claims access is "serialized by the frame's state mutex in with_exec()", but the f_locals() getter (line 456 in builtins/frame.rs) directly calls locals() which accesses fastlocals without holding that guard (line 395 in frame.rs). The comment at lines 405–406 acknowledges fastlocals is intentionally accessed outside with_exec() to avoid blocking. Since f_locals() is Python-exposed and callable from any thread, and FastLocals is marked Send + Sync, concurrent calls from multiple threads will perform unsynchronized reads/writes to the same UnsafeCell, violating Rust's aliasing guarantees.
Gate fastlocals borrows behind a guard/token tied to frame execution, or document that frames must not be accessed concurrently and remove Send + Sync (or add runtime checks that panic if accessed cross-thread).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/vm/src/frame.rs` around lines 96 - 133, The FastLocals UnsafeCell can
be accessed concurrently via the Python-exposed f_locals() path (which calls
locals()), violating the assumption that access is serialized by with_exec(); to
fix, either (A) remove the unsafe impls unsafe impl Send/Sync for FastLocals and
add runtime assertions to prevent cross-thread access, or (B) gate all
fastlocals borrows behind the frame execution guard/token (have f_locals()
acquire the same with_exec() guard or a borrow token before calling
locals()/FastLocals::borrow/borrow_mut), or (C) add a runtime thread-check that
panics on unsynchronized cross-thread access; update the FastLocals docs to
reflect the chosen model and adjust callers (f_locals, locals, and any direct
FastLocals usage) to obtain the guard/token or to compile without Send/Sync
accordingly.
Summary by CodeRabbit
Release Notes