Skip to content

Preserve str subclasses in ascii() for ASCII-only __repr__ results#7455

Merged
youknowone merged 4 commits intomainfrom
copilot/fix-ascii-discards-str-subclass
Mar 19, 2026
Merged

Preserve str subclasses in ascii() for ASCII-only __repr__ results#7455
youknowone merged 4 commits intomainfrom
copilot/fix-ascii-discards-str-subclass

Conversation

Copy link
Contributor

Copilot AI commented Mar 17, 2026

ascii() was always materializing a new plain str, which discarded str subclass types even when __repr__ had already returned an ASCII-only string. This diverged from CPython, which returns the original repr object unchanged in that case.

  • Behavior change

    • Update builtins.ascii() and PyObject::ascii() to short-circuit when repr(obj) is already ASCII.
    • Return the original repr object directly in that branch, preserving str subclass identity.
    • Keep the existing escape path for non-ASCII repr output, which still produces a new escaped plain str.
  • Call site updates

    • Adjust internal %a / format-conversion consumers to work with a returned PyStr object instead of an AsciiString.
    • Preserve existing formatting behavior while allowing subclass-preserving results to flow through unchanged where appropriate.
  • Regression coverage

    • Unskip the existing CPython-derived test_ascii case in Lib/test/test_str.py.
    • Add focused snippet coverage for both branches:
      • ASCII repr from a str subclass preserves the subclass
      • non-ASCII repr still escapes to a plain str
class MyStr(str):
    pass

class Foo:
    def __repr__(self):
        return MyStr("hello")

class Bar:
    def __repr__(self):
        return MyStr("héllo")

type(ascii(Foo()))  # MyStr
type(ascii(Bar()))  # str
Original prompt

This section details on the original issue you should resolve

<issue_title>ascii() discards str subclass type when repr result is already ASCII</issue_title>
<issue_description>## Description

ascii() always creates a new plain str from the repr result, even when the repr is already ASCII. CPython's PyObject_ASCII returns the repr object directly when it's already ASCII, preserving str subclass types. This is a CPython compatibility issue.

Root Cause

builtin_ascii() unconditionally converts the repr result to a new AsciiString via to_ascii(), which then becomes a plain str through ToPyObject. The same pattern exists in PyObject::ascii().

CPython reference: object.c#L848-L849

// Key branch: return repr directly when it's already ASCII
if (PyUnicode_IS_ASCII(repr))
    return repr;

Fix

Add an ASCII check before to_ascii() — if the repr is already ASCII, return the repr object directly instead of creating a new string.

Reproduction

class MyStr(str):
    pass

class Foo:
    def __repr__(self):
        return MyStr('hello')

result = ascii(Foo())
print(type(result))

Output

RustPython:

<class 'str'>

CPython:

<class '__main__.MyStr'>

Environment

  • RustPython d248a04 (Python 3.14.0)
  • CPython 3.14.3
  • OS: Debian 12</issue_description>

Comments on the Issue (you are @copilot in this section)


📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.

Summary by CodeRabbit

  • Bug Fixes

    • Refined the ascii() builtin function to properly handle cases where __repr__ returns a string subclass, preserving the return type when the representation is already in ASCII format.
  • Tests

    • Added test coverage for ascii() behavior with custom string subclasses.

Co-authored-by: youknowone <69878+youknowone@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix ascii() to preserve str subclass types Preserve str subclasses in ascii() for ASCII-only __repr__ results Mar 17, 2026
Copilot AI requested a review from youknowone March 17, 2026 11:32
@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[x] test: cpython/Lib/test/test_str.py (TODO: 15)
[x] test: cpython/Lib/test/test_fstring.py (TODO: 19)
[x] test: cpython/Lib/test/test_string_literals.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on str)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

assert ascii(chr(0x9999)) == "'\\u9999'"
assert ascii(chr(0x0A)) == "'\\n'"

class MyStr(str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[auto-format] reported by reviewdog 🐶

Suggested change
class MyStr(str):
class MyStr(str):

class MyStr(str):
pass

class Foo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[auto-format] reported by reviewdog 🐶

Suggested change
class Foo:
class Foo:

def __repr__(self):
return MyStr("hello")

class Bar:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[auto-format] reported by reviewdog 🐶

Suggested change
class Bar:
class Bar:

def __repr__(self):
return MyStr("héllo")

assert type(ascii(Foo())) is MyStr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[auto-format] reported by reviewdog 🐶

Suggested change
assert type(ascii(Foo())) is MyStr
assert type(ascii(Foo())) is MyStr

@youknowone
Copy link
Member

@copilot Fix CI failures

Co-authored-by: youknowone <69878+youknowone@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 17, 2026

@copilot Fix CI failures

Fixed in 6803cee. The CI failures were caused by ruff format --check flagging missing PEP 8 blank lines (two blank lines before top-level class definitions) in extra_tests/snippets/builtin_ascii.py. Applied ruff format to fix the formatting.

@youknowone youknowone marked this pull request as ready for review March 18, 2026 12:50
@youknowone
Copy link
Member

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

This PR modifies the ascii() function and related conversion paths to preserve str subclass types when the repr is already ASCII. Instead of always creating a new plain string, the function now returns the repr object directly if it's already ASCII, aligning with CPython's behavior.

Changes

Cohort / File(s) Summary
Builtin and Protocol ASCII Changes
crates/vm/src/stdlib/builtins.rs, crates/vm/src/protocol/object.rs
Changed return type from ascii::AsciiString to PyRef<PyStr>. Both functions now check if repr is already ASCII and return it directly; otherwise, construct a new PyStr from the ASCII representation. Preserves str subclass types when repr is already ASCII.
Format Path Updates
crates/vm/src/cformat.rs, crates/vm/src/format.rs, crates/vm/src/frame.rs
Updated to accommodate the new ASCII return type. Removed intermediate vm.ctx.new_str(...) wrapping steps; instead directly convert results. Uses as_bytes().to_vec() and as_wtf8().to_owned() for appropriate conversions.
Test Coverage
extra_tests/snippets/builtin_ascii.py
Added test cases for str subclass preservation, including MyStr subclass with ASCII and non-ASCII __repr__ methods to verify behavior aligns with CPython.

Sequence Diagram

sequenceDiagram
    actor User
    participant ascii as ascii() Function
    participant repr as repr() Protocol
    participant check as ASCII Check
    participant return_direct as Return Repr<br/>(subclass preserved)
    participant escape as Escape Non-ASCII
    participant return_new as Return New PyStr

    User->>ascii: ascii(obj)
    ascii->>repr: Get __repr__(obj)
    repr-->>ascii: repr_result (PyRef<PyStr>)
    ascii->>check: Is repr already ASCII?
    alt Already ASCII
        check->>return_direct: Yes
        return_direct-->>ascii: repr_result unchanged
        ascii-->>User: PyRef<PyStr> (subclass preserved)
    else Non-ASCII
        check->>escape: No, escape non-ASCII chars
        escape->>return_new: Create new plain PyStr
        return_new-->>ascii: new PyStr
        ascii-->>User: PyRef<PyStr> (plain str)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • Align f-string related bytecodes with 3.13 #6321: Modifies the same VM ASCII conversion path and ConvertValue/ConvertValueOparg handling in crates/vm/src/frame.rs, with related adjustments to how ascii() results are produced.
  • Refactor PyUtf8Str #6047: Updates the same string/representation APIs including PyObject::ascii, format paths, and uses of PyStr/PyRef types.

Poem

🐰 The ASCII string hops with glee,
Subclass types now run wild and free!
No needless wrapping, straight and true,
CPython's path for me and you! 🌿✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: preserving str subclasses in ascii() when repr returns ASCII-only strings, which is the core objective of the PR.
Linked Issues check ✅ Passed The PR implements all objectives from issue #7451: it detects ASCII-only repr objects and returns them directly to preserve str subclass types, only escaping non-ASCII repr to plain str, with updated call sites and comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue: modifying builtins.ascii() and PyObject::ascii() signatures, updating internal call sites, and adding test coverage. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch copilot/fix-ascii-discards-str-subclass
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/vm/src/stdlib/builtins.rs (1)

67-74: Code duplication with PyObject::ascii() in object.rs.

The implementation here is identical to PyObject::ascii() at lines 373-379 in crates/vm/src/protocol/object.rs. Consider having one delegate to the other to avoid maintaining the same logic in two places.

♻️ Option: Delegate to PyObject::ascii()
 #[pyfunction]
 pub fn ascii(obj: PyObjectRef, vm: &VirtualMachine) -> PyResult<PyStrRef> {
-    let repr = obj.repr(vm)?;
-    if repr.as_wtf8().is_ascii() {
-        Ok(repr)
-    } else {
-        Ok(vm.ctx.new_str(to_ascii(repr.as_wtf8())))
-    }
+    obj.ascii(vm)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/stdlib/builtins.rs` around lines 67 - 74, Replace the
duplicated logic in builtins::ascii (the function pub fn ascii in builtins.rs)
by delegating to the existing PyObject::ascii() implementation on the object
(call the object's ascii method with the same vm) and return its
PyResult<PyStrRef>; remove the manual repr/as_wtf8/is_ascii/to_ascii flow so the
builtin just forwards to obj.ascii(vm) (adjust imports/signature if needed to
match the method call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/vm/src/stdlib/builtins.rs`:
- Around line 67-74: Replace the duplicated logic in builtins::ascii (the
function pub fn ascii in builtins.rs) by delegating to the existing
PyObject::ascii() implementation on the object (call the object's ascii method
with the same vm) and return its PyResult<PyStrRef>; remove the manual
repr/as_wtf8/is_ascii/to_ascii flow so the builtin just forwards to
obj.ascii(vm) (adjust imports/signature if needed to match the method call).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 5fb0f8c6-cca6-48d9-a5fc-c96d61bd3787

📥 Commits

Reviewing files that changed from the base of the PR and between 2a61237 and 6803cee.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_str.py is excluded by !Lib/**
📒 Files selected for processing (6)
  • crates/vm/src/cformat.rs
  • crates/vm/src/format.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/protocol/object.rs
  • crates/vm/src/stdlib/builtins.rs
  • extra_tests/snippets/builtin_ascii.py

@youknowone
Copy link
Member

@copilot check review #7455 (review)
Delegate to PyObject::ascii()

Co-authored-by: youknowone <69878+youknowone@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 18, 2026

@copilot check review #7455 (review)
Delegate to PyObject::ascii()

Done in 77767fc. builtin_ascii() now delegates to obj.ascii(vm) and the to_ascii import was removed from builtins.rs.

Copilot AI requested a review from youknowone March 18, 2026 13:22
@youknowone youknowone merged commit 2f181ef into main Mar 19, 2026
17 checks passed
@youknowone youknowone deleted the copilot/fix-ascii-discards-str-subclass branch March 19, 2026 00:48
youknowone pushed a commit to youknowone/RustPython that referenced this pull request Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ascii() discards str subclass type when repr result is already ASCII

2 participants