🌐 AI搜索 & 代理 主页
Skip to content

Conversation

@ShaharNaveh
Copy link
Collaborator

@ShaharNaveh ShaharNaveh commented Aug 11, 2025

The idea is to have a file that defines which tests needs to be marked with what (expectedFailure, skipIf, etc..), and have a code that automatically applies those patches on a given test file.

This is a POC only, I have plans to make it more modular using argparse and some flags, but the general gist will remain the same.

This tool helps with updating test files from CPython.

You can run it like this:

./scripts/lib_updater.py --from Lib/test/foo.py --to ~/cpython/Lib/test/foo.py -o Lib/test/foo.py

to move the patches found in Lib/test/foo.py to ~/cpython/Lib/test/foo.py then write the contents back to Lib/test/foo.py (you can omit the -o arg to print the new contents to stdout)

You can also run it with:

./scripts/lib_updater.py --from Lib/test/foo.py --show-patches -o my_patches.json

To get a baseline of patches. You can alter the patches file with your favorite tool/script/etc and then reapply it with:

./scripts/lib_updater.py -p my_patches. json --to Lib/test/foo.py

cc: @youknowone @arihant2math @moreal @reactive-firewall would love to hear your feedback

ref: #5974


This PR also updates test_os.py from 3.13.7 btw

Summary by CodeRabbit

  • New Features
    • Adds a standalone CLI to analyze unittest tests and generate/apply structured patch blocks.
    • Can derive patches from source or load from JSON; includes a preview mode to show patches without applying.
    • Preserves indentation and insertion points, supports in-place updates or printing results, and emits wrapper entries for inherited tests.
    • Handles platform-specific conditions and warns about missing or unmatched test references.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

Walkthrough

Adds a new CLI script scripts/lib_updater.py that parses unittest modules via AST to extract skip/xfail decorators into structured PatchSpec/PatchEntry objects, builds patch blocks for existing and inherited tests, and can show, serialize, or apply those patches to a target file (optionally in-place).

Changes

Cohort / File(s) Summary of changes
New CLI tool for unittest patching
scripts/lib_updater.py
Adds a standalone CLI implementing Patches type alias, indentation/comment constants (COL_OFFSET, INDENT1, INDENT2, COMMENT), UtMethod enum, PatchSpec and PatchEntry types; test enumeration (iter_tests, iter_patches), patch extraction (PatchEntry.iter_patch_entires), patch dict builder (build_patch_dict), patch line generator (iter_patch_lines), patch applier (apply_patches), output writer (write_output), CLI builder (build_argparse), and runtime behavior supporting --patches, --show-patches, and --inplace.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as lib_updater CLI
  participant FS as FileSystem
  participant AST as AST Parser
  participant PG as Patch Generator

  User->>CLI: run (orig_file, [remote_file], flags)
  CLI->>FS: read orig_file
  CLI->>AST: parse orig_file
  AST-->>CLI: AST tree
  CLI->>PG: iter_patches(tree) or load --patches JSON
  PG-->>CLI: patches dict

  alt --show-patches
    CLI-->>User: print patches JSON
  else Apply patches
    CLI->>FS: read remote_file (or orig_file)
    CLI->>AST: parse remote file
    AST-->>CLI: remote AST
    CLI->>PG: iter_patch_lines(remote_tree, patches)
    PG-->>CLI: [(line_no, patch_block)]
    CLI->>FS: write modified content (if --inplace)
    CLI-->>User: print modified content (if not --inplace)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I nibble lines with careful bite,
Through AST burrows late at night,
I stamp skips and xfails with cheer,
Patch blocks hop in, tidy and clear,
Tests wake brighter — code takes flight. 🐰✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

# Convert a number of nanosecond (int) to a number of seconds (float).
# Round towards infinity by adding 0.5 nanosecond to avoid rounding
# issue, os.utime() rounds towards minus infinity.
# XXX: RUSTPYTHON os.utime() use `[Duration::from_secs_f64](https://doc.rust-lang.org/std/time/struct.Duration.html#method.try_from_secs_f64)`
Copy link
Collaborator Author

@ShaharNaveh ShaharNaveh Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this crazy idea to store patch files in the repo for patches like this. similar to what Termux is doing for their packages, for example:
https://github.com/termux/termux-packages/blob/9865d958666e04f16201a0774a646f37b6082c80/packages/python/0001-fix-hardcoded-paths.patch


That's probably out of scope for this PR tho

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, your idea is not so crazy, C is NOT Rust after all, it might be needed, but yeah, totally out-of-scope for this PR.

what about an empty patch folder for now? TL;DR 🤔 I think patches for stuff like this might be the right idea, git can even generate them for us, especially if it we can make them configureable (knobs like: os/arch pair, patch-id, cpython version, and context lines needed (for preventing regressions in the future, etc.) come to mind) for now we could just have a patch folder (empty, but as a placeholder)

@ShaharNaveh ShaharNaveh marked this pull request as ready for review August 11, 2025 15:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (3)
tools/test_marker/main.py (3)

57-57: Simplify redundant string operations.

The chained strip operations are redundant. Consider simplifying.

-    return res.strip().rstrip(";").strip()
+    return res.rstrip(";").strip()

148-148: Fix typo in comment.

-    # Phase 2: Iterate and mark inhereted tests
+    # Phase 2: Iterate and mark inherited tests

191-191: Fix typo in comment.

-    # Going in reverse to not distrupt the lines offset
+    # Going in reverse to not disrupt the line offsets
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9a9e3b and 9622114.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_os.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • tools/test_marker/confs/test_os.toml (1 hunks)
  • tools/test_marker/main.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)

**/*.py: Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • tools/test_marker/main.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only
🪛 Flake8 (7.2.0)
tools/test_marker/main.py

[error] 19-19: SyntaxError: invalid syntax

(E999)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
🔇 Additional comments (2)
tools/test_marker/confs/test_os.toml (1)

1-4: LGTM!

The path declaration and FileTests section are well-structured with descriptive reasons.

tools/test_marker/main.py (1)

163-169: Verify insertion point for inherited test methods.

Using cls_node.end_lineno directly might place the new methods at the wrong indentation level or inside the last method of the class. Consider using cls_node.end_lineno - 1 or adjusting based on the actual class structure.

Comment on lines 19 to 20
type Patch = dict[str, dict[str, str]]
type Conf = dict[str, Patch]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use pre-3.12 compatible type alias syntax.

The type statement for type aliases requires Python 3.12+. Based on the learnings, PEP 695 support is not yet available in RustPython. Use TypeAlias from typing for compatibility.

-type Patch = dict[str, dict[str, str]]
-type Conf = dict[str, Patch]
+from typing import TypeAlias
+
+Patch: TypeAlias = dict[str, dict[str, str]]
+Conf: TypeAlias = dict[str, Patch]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type Patch = dict[str, dict[str, str]]
type Conf = dict[str, Patch]
from typing import TypeAlias
Patch: TypeAlias = dict[str, dict[str, str]]
Conf: TypeAlias = dict[str, Patch]
🧰 Tools
🪛 Flake8 (7.2.0)

[error] 19-19: SyntaxError: invalid syntax

(E999)

🤖 Prompt for AI Agents
In tools/test_marker/main.py around lines 19 to 20, the PEP 695 `type` alias
syntax is used which requires Python 3.12+; replace it with typing.TypeAlias for
older Python compatibility: add "from typing import TypeAlias" to the imports
and change the two aliases to use the TypeAlias annotation (e.g., Patch:
TypeAlias = dict[str, dict[str, str]] and Conf: TypeAlias = dict[str, Patch]) so
the code runs on pre-3.12 interpreters.

Comment on lines 86 to 87
if attr_node.value.id != "unittest":
return False
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add safety check for attr_node.value.

Accessing attr_node.value.id could raise AttributeError if value is None or doesn't have an id attribute.

-    if attr_node.value.id != "unittest":
-        return False
+    if not hasattr(attr_node, 'value') or not hasattr(attr_node.value, 'id'):
+        return False
+    if attr_node.value.id != "unittest":
+        return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if attr_node.value.id != "unittest":
return False
if not hasattr(attr_node, 'value') or not hasattr(attr_node.value, 'id'):
return False
if attr_node.value.id != "unittest":
return False
🤖 Prompt for AI Agents
In tools/test_marker/main.py around lines 86 to 87, the code directly accesses
attr_node.value.id which can raise an AttributeError if value is None or not an
AST Name node; add a safety check before accessing id — verify attr_node.value
is not None and is an ast.Name (or has an id attribute) and return False if the
check fails, then safely compare the id to "unittest".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaharNaveh The suggestion is a good idea, and probably safer than just the name check above.

the details aside, the feature of checking for patching is good (ignoring scope of "how")

@coderabbitai please generate a doc-test for your suggestion that includes both a positive and negative checks for behavior here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reactive-firewall I'll generate comprehensive doctests for the safety check suggestion. Let me create examples that show both positive and negative behavior:

def is_patch_present(node: ast.Attribute | ast.Call, patch_conf: Patch) -> bool:
    """
    Detect whether an AST node (of a decorator) is matching to our patch.

    We accept both `ast.Attribute` and `ast.Call` because:
        * ast.Attribute: `@unittest.expectedFailure`
        * ast.Call: `@unittest.expectedFailureIfWindows(...)` / `@unittest.skipIf(...)`

    Parameters
    ----------
    node : ast.Attribute | ast.Call
        AST node to query.
    patch_conf : Patch
        Patch(es) to match against.

    Returns
    -------
    bool
        Whether or not we got a match.

    Examples
    --------
    # Positive case: Valid unittest decorator (ast.Attribute)
    >>> import ast
    >>> # Create AST for @unittest.expectedFailure
    >>> unittest_name = ast.Name(id='unittest', ctx=ast.Load())
    >>> attr_node = ast.Attribute(value=unittest_name, attr='expectedFailure', ctx=ast.Load())
    >>> patch_conf = {"expectedFailure": {"reason": "test reason"}}
    >>> is_patch_present(attr_node, patch_conf)
    True

    # Positive case: Valid unittest decorator (ast.Call)
    >>> # Create AST for @unittest.skipIf(condition, "reason")
    >>> call_node = ast.Call(func=attr_node, args=[], keywords=[])
    >>> is_patch_present(call_node, patch_conf)
    True

    # Negative case: attr_node.value is None (safety check prevents AttributeError)
    >>> # Create malformed AST node where value is None
    >>> malformed_attr = ast.Attribute(value=None, attr='expectedFailure', ctx=ast.Load())
    >>> is_patch_present(malformed_attr, patch_conf)
    False

    # Negative case: attr_node.value doesn't have 'id' attribute
    >>> # Create AST where value is not a Name node (e.g., a Constant)
    >>> non_name_value = ast.Constant(value="not_a_name")
    >>> attr_with_non_name = ast.Attribute(value=non_name_value, attr='expectedFailure', ctx=ast.Load())
    >>> is_patch_present(attr_with_non_name, patch_conf)
    False

    # Negative case: Different module name
    >>> # Create AST for @other_module.expectedFailure
    >>> other_name = ast.Name(id='other_module', ctx=ast.Load())
    >>> other_attr = ast.Attribute(value=other_name, attr='expectedFailure', ctx=ast.Load())
    >>> is_patch_present(other_attr, patch_conf)
    False

    # Negative case: ast.Name node (should return False early)
    >>> name_node = ast.Name(id='some_function', ctx=ast.Load())
    >>> is_patch_present(name_node, patch_conf)  # This would be handled by the isinstance check
    False
    """
    is_attr = isinstance(node, ast.Attribute)
    attr_node = node if is_attr else node.func

    if isinstance(attr_node, ast.Name):
        return False

    # Safety check before accessing attr_node.value.id
    if not hasattr(attr_node, 'value') or not hasattr(attr_node.value, 'id'):
        return False
    if attr_node.value.id != "unittest":
        return False
    
    # ... rest of the function implementation

The doctests demonstrate:

  1. Positive cases: Valid unittest decorators (both ast.Attribute and ast.Call) that should return True when matched
  2. Negative cases with safety:
    • attr_node.value is None - safety check prevents AttributeError
    • attr_node.value doesn't have id attribute (e.g., ast.Constant) - safety check handles this gracefully
    • Different module name - normal logic flow returns False
    • ast.Name node - handled by existing isinstance check

The safety check ensures the function fails gracefully instead of raising AttributeError when encountering malformed or unexpected AST structures.


🧠 Learnings used
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only

Comment on lines 199 to 205
def main():
for conf_file in CONFS.rglob("*.toml"):
conf = tomllib.loads(conf_file.read_text())
path = ROOT_DIR / conf.pop("path")
patched = apply_conf(path.read_text(), conf)
path.write_text(patched + "\n")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for file operations.

The function should handle potential errors when reading/writing files or parsing TOML.

 def main():
     for conf_file in CONFS.rglob("*.toml"):
-        conf = tomllib.loads(conf_file.read_text())
-        path = ROOT_DIR / conf.pop("path")
-        patched = apply_conf(path.read_text(), conf)
-        path.write_text(patched + "\n")
+        try:
+            conf = tomllib.loads(conf_file.read_text())
+            path = ROOT_DIR / conf.pop("path")
+            if not path.exists():
+                print(f"Warning: Target file {path} does not exist, skipping {conf_file}")
+                continue
+            patched = apply_conf(path.read_text(), conf)
+            path.write_text(patched + "\n")
+        except Exception as e:
+            print(f"Error processing {conf_file}: {e}")
+            continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def main():
for conf_file in CONFS.rglob("*.toml"):
conf = tomllib.loads(conf_file.read_text())
path = ROOT_DIR / conf.pop("path")
patched = apply_conf(path.read_text(), conf)
path.write_text(patched + "\n")
def main():
for conf_file in CONFS.rglob("*.toml"):
try:
conf = tomllib.loads(conf_file.read_text())
path = ROOT_DIR / conf.pop("path")
if not path.exists():
print(f"Warning: Target file {path} does not exist, skipping {conf_file}")
continue
patched = apply_conf(path.read_text(), conf)
path.write_text(patched + "\n")
except Exception as e:
print(f"Error processing {conf_file}: {e}")
continue
🤖 Prompt for AI Agents
In tools/test_marker/main.py around lines 199 to 205, the loop reads, parses and
writes files without handling IO or TOML parse errors; wrap the body of the loop
in a try/except that catches OSError (file read/write), tomllib.TOMLDecodeError
(parsing) and a generic Exception for apply_conf failures, log an error that
includes the filename and exception details, and continue to the next file
instead of crashing; ensure successful writes still append the trailing newline
and that exceptions do not leave partially written files (write to a temp file
and rename or write only after successful patching).

@@ -0,0 +1,103 @@
path = "Lib/test/test_os.py"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also had an idea of having something like:

Suggested change
path = "Lib/test/test_os.py"
path = "Lib/test/test_os.py"
cpython_version = "3.13.6"

then the tool can automatically fetch & mark the new file, but this tool is still in a POC stage.

I want to know if this tool is something that is even wanted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding any interest

@ShaharNaveh,

Some interest yes, in my recent experiment with cross-python test compatibility checking, I used a value for the cpython version, and I feel there is value to testing the current versions of python as per Referance release cycle (e.g., versions 3.10-3.14 atm.)

Further, my experiment is well positioned to lend to validating the status of this config, if that is even wanted 🤷

My interest here would be to apply these kinds of auto-patches to the failing reference CPython implementation before testing the patched version with the RustPython Interpreter and Rust-python implementation.

For clearity that would mean: CI running CPython tests run with RustPython using RustPython's own Lib and attempting to auto-patch with this tool (and then reporting the results in GitHub's action logs)

This would lead to a feedback-loop of

  • Developer makes changes (e.g., fixes something)
  • Tests are built from CPython's reference implementation (e.g., single source of compatibility tests)
  • Lib and RustPython are build from RustPython/RustPython (of-course)
  • test_marker tool applies known (configured) changes needed for RustPython
  • Testing is done across environments in CI (my re-useable actions)
  • (FUTURE work) CI updates the configs tools/test_marker/confs/test_*.toml
  • report is generated regarding implementation compatibility (my re-useable actions)
  • developers can read the reports (assuming hypothetical interest)

lessons/gotchas from auto-patching

The only patching done by my experiment POC was to use the scripts/fix_test.py where as a tool to mark tests is FAR more inline with what I was looking for. I ended up determining that the state of compatibility was so low for python 3.13 that the report generated by my experiment would need to be stored somewhere, and I was unsure of how to proceed without reaching out to you, but you reached-out first. 🎉

the knobs I implemented were:

  • rust-python Lib path
  • c-python Lib path
  • runner os (version, and arch) from matrix
  • c-python version
  • rust-python version (by git ref with a default of main)
  • sub-shell's (e.g. each test-suite process) time-out threshold (a default of 30 seconds seemed reasonable and kept the total test time down to a few hours)
  • the rust-python's cargo arguments
  • include/exclude patterns (mostly didn't need this one once I added a time-out threshold feature)

Hints for AI

@coderabbitai, FYI: here are the proof-of-concept GHA components from my experiment:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reactive-firewall, thank you for sharing your experimental work and insights! Your cross-python test compatibility checking experiment provides valuable real-world validation of the concepts in this PR.

Excellent Alignment and Enhancements

Your suggestion to add cpython_version = "3.13.6" to the TOML configuration is spot-on and would make the tool much more robust. This would enable:

  • Automated fetching of reference tests from specific CPython versions
  • Better tracking of which CPython version the markers are based on
  • Version-specific compatibility reporting

Your CI/CD Workflow Vision

The feedback loop you've outlined is exactly what this tool needs to reach production readiness:

graph TD
    A[Developer Changes] --> B[Build CPython Reference Tests]
    B --> C[Apply test_marker patches]
    C --> D[Test with RustPython]
    D --> E[Generate Compatibility Report]
    E --> F[Update TOML configs]
    F --> A
Loading

Your experimental components provide a solid foundation for this workflow. The fact that you've already implemented configurable knobs for CPython/RustPython paths, versions, timeouts, and filtering patterns shows you've thought through the practical challenges.

Key Insights from Your Experiment

Your finding about Python 3.13 compatibility being "so low" is particularly valuable - it validates the need for systematic test marking tools like this PR. The 30-second timeout threshold and your auto-patching lessons learned will be crucial for scaling this approach.

Potential Synergies

Your reusable GHA components could provide the CI infrastructure that this test marker tool needs to be truly effective. The combination of:

  • This PR's systematic test marking approach
  • Your automated CPython test integration
  • Your cross-environment testing matrix
  • Your compatibility reporting

...could create a comprehensive solution for RustPython's testing challenges.

Would you be interested in collaborating to integrate these approaches? Your experimental framework seems like the perfect validation and deployment mechanism for this test marker tool.


🧠 Learnings used
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only

Copy link

@reactive-firewall reactive-firewall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤩 I especially like the ast based approach to marking tests, and I think this is a good starting point for marking tests that only fail for RustPython.

See the remainder of my comments inline.

@@ -0,0 +1,103 @@
path = "Lib/test/test_os.py"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding any interest

@ShaharNaveh,

Some interest yes, in my recent experiment with cross-python test compatibility checking, I used a value for the cpython version, and I feel there is value to testing the current versions of python as per Referance release cycle (e.g., versions 3.10-3.14 atm.)

Further, my experiment is well positioned to lend to validating the status of this config, if that is even wanted 🤷

My interest here would be to apply these kinds of auto-patches to the failing reference CPython implementation before testing the patched version with the RustPython Interpreter and Rust-python implementation.

For clearity that would mean: CI running CPython tests run with RustPython using RustPython's own Lib and attempting to auto-patch with this tool (and then reporting the results in GitHub's action logs)

This would lead to a feedback-loop of

  • Developer makes changes (e.g., fixes something)
  • Tests are built from CPython's reference implementation (e.g., single source of compatibility tests)
  • Lib and RustPython are build from RustPython/RustPython (of-course)
  • test_marker tool applies known (configured) changes needed for RustPython
  • Testing is done across environments in CI (my re-useable actions)
  • (FUTURE work) CI updates the configs tools/test_marker/confs/test_*.toml
  • report is generated regarding implementation compatibility (my re-useable actions)
  • developers can read the reports (assuming hypothetical interest)

lessons/gotchas from auto-patching

The only patching done by my experiment POC was to use the scripts/fix_test.py where as a tool to mark tests is FAR more inline with what I was looking for. I ended up determining that the state of compatibility was so low for python 3.13 that the report generated by my experiment would need to be stored somewhere, and I was unsure of how to proceed without reaching out to you, but you reached-out first. 🎉

the knobs I implemented were:

  • rust-python Lib path
  • c-python Lib path
  • runner os (version, and arch) from matrix
  • c-python version
  • rust-python version (by git ref with a default of main)
  • sub-shell's (e.g. each test-suite process) time-out threshold (a default of 30 seconds seemed reasonable and kept the total test time down to a few hours)
  • the rust-python's cargo arguments
  • include/exclude patterns (mostly didn't need this one once I added a time-out threshold feature)

Hints for AI

@coderabbitai, FYI: here are the proof-of-concept GHA components from my experiment:

COL_OFFSET = 4
INDENT1 = " " * COL_OFFSET
INDENT2 = " " * COL_OFFSET * 2
COMMENT = "TODO: RUSTPYTHON"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 This would be nice to have an easy override for (CI may be-able to add more details dynamically that way)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ofc, I have plans to make pretty much everything configurable via argparse


COL_OFFSET = 4
INDENT1 = " " * COL_OFFSET
INDENT2 = " " * COL_OFFSET * 2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the easy maintainability win:

Suggested change
INDENT2 = " " * COL_OFFSET * 2
INDENT2 = INDENT1 * 2

@coderabbitai, please explain this suggestion for posterity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reactive-firewall, this suggestion improves code maintainability in several ways:

  1. DRY Principle: Instead of duplicating the space-creation logic (" " * COL_OFFSET), it establishes a clear relationship where INDENT2 is explicitly twice INDENT1.

  2. Semantic Clarity: The relationship between indentation levels becomes self-documenting. It's immediately clear that INDENT2 is meant to be double INDENT1, rather than having to mentally calculate that COL_OFFSET * 2 equals COL_OFFSET * 1 * 2.

  3. Easier Maintenance: If you ever need to:

    • Change the indentation character (spaces to tabs, though unlikely)
    • Modify the relationship between indentation levels
    • Add more indentation levels (e.g., INDENT3 = INDENT1 * 3)

    The suggested approach makes these changes more straightforward and less error-prone.

  4. Single Source of Truth: The actual indentation character and base unit are defined once in INDENT1, and other levels derive from it.

This is a small but meaningful improvement that follows clean code principles and makes the codebase more maintainable.

(◕‿◕)ノ彡 ♪


🧠 Learnings used
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh this should be removed and be discovered dynamically. I was just lazy with implementing this, but should probably invest a bit to do it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai, I concur with @ShaharNaveh that this should be dynamic; can you please explore the code in the rest of the new tool (e.g., the rest of tools/test_marker/main.py) and generate a helper function that takes a snippet of code for the target test functions (e.g., like how the patches are applied with ast and INDENT1/INDENT2 currently) and computes the needed indents for the indentation dynamically? be sure to consider SOLID principles and BCP for testing, and all project conventions when generating code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

I'll explore the complete file to understand how indentation is currently used and create a dynamic solution.

Now I'll create a helper function that dynamically computes indentation based on the actual code structure:

def detect_indentation(source_lines: list[str], node: ast.AST | None = None) -> tuple[str, dict[int, str]]:
    """
    Dynamically detect indentation patterns from source code.
    
    Args:
        source_lines: List of source code lines
        node: Optional AST node to provide context for indentation level
        
    Returns:
        Tuple of (base_unit, level_map) where:
        - base_unit: The basic indentation unit (e.g., "    " for 4 spaces)
        - level_map: Mapping of nesting level to indentation string
        
    Follows SOLID principles:
    - Single Responsibility: Only handles indentation detection
    - Open/Closed: Extensible for different indentation styles
    - Interface Segregation: Focused interface for indentation needs
    """
    if not source_lines:
        # Default fallback to 4 spaces (Python convention)
        base_unit = "    "
        return base_unit, {0: "", 1: base_unit, 2: base_unit * 2}
    
    # Detect indentation pattern by analyzing non-empty, indented lines
    indentation_samples = []
    for line in source_lines:
        if line.strip() and line[0] in (' ', '\t'):
            # Count leading whitespace
            indent_count = 0
            for char in line:
                if char == ' ':
                    indent_count += 1
                elif char == '\t':
                    # Treat tab as 8 spaces (Python standard)
                    indent_count += 8
                else:
                    break
            if indent_count > 0:
                indentation_samples.append(indent_count)
    
    if not indentation_samples:
        # No indented lines found, use default
        base_unit = "    "
    else:
        # Find the greatest common divisor to determine base indentation unit
        from math import gcd
        base_count = indentation_samples[0]
        for sample in indentation_samples[1:]:
            base_count = gcd(base_count, sample)
        
        # Ensure minimum of 1 space and reasonable maximum
        base_count = max(1, min(base_count, 8))
        
        # Check if tabs are used
        has_tabs = any('\t' in line for line in source_lines if line.strip())
        if has_tabs:
            # Mixed or tab-based indentation - normalize to spaces
            # Most Python projects use spaces, so convert tabs to equivalent spaces
            base_unit = " " * base_count
        else:
            base_unit = " " * base_count
    
    # Create level mapping for common nesting levels
    level_map = {}
    for level in range(5):  # Support up to 4 levels of nesting
        level_map[level] = base_unit * level
    
    return base_unit, level_map


def get_context_indentation(source_lines: list[str], target_line: int) -> tuple[str, str]:
    """
    Get appropriate indentation for a specific context in the source code.
    
    Args:
        source_lines: List of source code lines
        target_line: Line number where indentation context is needed (0-based)
        
    Returns:
        Tuple of (class_level_indent, method_level_indent) for the given context
        
    This function follows the Single Responsibility Principle by focusing
    solely on determining contextual indentation needs.
    """
    base_unit, level_map = detect_indentation(source_lines)
    
    if target_line < 0 or target_line >= len(source_lines):
        # Default to class and method level indentation
        return level_map[1], level_map[2]
    
    # Analyze the context around the target line to determine nesting level
    current_indent_level = 0
    
    # Look backwards to find the current indentation context
    for i in range(min(target_line, len(source_lines) - 1), -1, -1):
        line = source_lines[i].rstrip()
        if line.strip():
            # Count leading whitespace to determine current level
            leading_spaces = len(line) - len(line.lstrip())
            if base_unit:
                current_indent_level = leading_spaces // len(base_unit)
            break
    
    # Return appropriate indentation for decorators (class level) and method bodies
    class_level_indent = level_map.get(max(1, current_indent_level), level_map[1])
    method_level_indent = level_map.get(max(2, current_indent_level + 1), level_map[2])
    
    return class_level_indent, method_level_indent


def create_indented_line(content: str, indent_level: str, line_type: str = "decorator") -> str:
    """
    Create a properly indented line for insertion into source code.
    
    Args:
        content: The content to be indented (without leading/trailing whitespace)
        indent_level: The indentation string to apply
        line_type: Type of line being created ("decorator", "method", "statement")
        
    Returns:
        Properly formatted and indented line ready for insertion
        
    Follows Open/Closed Principle: extensible for different line types
    without modifying existing behavior.
    """
    clean_content = content.strip()
    
    # Apply specific formatting rules based on line type
    if line_type == "decorator":
        if not clean_content.startswith('@'):
            clean_content = f"@{clean_content}"
    elif line_type == "method":
        if not clean_content.endswith(':'):
            clean_content = f"{clean_content}:"
    
    return f"{indent_level}{clean_content}"

@reactive-firewall, I've created a comprehensive dynamic indentation system that follows SOLID principles and best practices:

Key Features:

  1. detect_indentation() - Dynamically analyzes source code to determine the base indentation pattern (spaces vs tabs, indentation size)

  2. get_context_indentation() - Provides context-aware indentation by analyzing the surrounding code structure

  3. create_indented_line() - Creates properly formatted lines with appropriate indentation

SOLID Principles Applied:

  • Single Responsibility: Each function has one clear purpose (detection, context analysis, line creation)
  • Open/Closed: Extensible for new indentation styles and line types without modification
  • Interface Segregation: Clean, focused interfaces for specific indentation needs
  • Dependency Inversion: Functions work with abstractions (source lines) rather than concrete implementations

Best Practices for Testing:

  • Functions return predictable data structures for easy testing
  • Error handling with sensible fallbacks
  • Support for edge cases (empty files, mixed indentation)
  • Clear separation of concerns for unit testing

Usage Example:

# Replace hardcoded INDENT1/INDENT2 with:
source_lines = contents.splitlines()
class_indent, method_indent = get_context_indentation(source_lines, target_line)
decorator_line = create_indented_line("@unittest.expectedFailure", class_indent, "decorator")

This approach makes the tool more robust and adaptable to different coding styles while maintaining clean, testable code architecture.

(◕‿◕)ノ彡 ♪


🏁 Script executed:

#!/bin/bash
# Get the complete content of the main.py file to understand the current implementation
cat tools/test_marker/main.py

Length of output: 5541



reason = conf.get("reason", "")
res = ""
match method:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai, FYI:
Documentation hint: match was introduced in python3.10?? (need to verify) This will prevent any support for python3.9 (but I'm fine with that, but we should make note of it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@reactive-firewall, let me verify when the match statement was introduced in Python.


🌐 Web query:

When was the match statement introduced in Python? What Python version first supported structural pattern matching?

💡 Result:

The match statement (structural pattern matching) was introduced in Python 3.10. Python 3.10.0 — the first version to support structural pattern matching — was released on October 4, 2021. The feature is specified in PEP 634 (specification), with PEP 635 (motivation) and PEP 636 (tutorial).


@reactive-firewall, you're absolutely correct! The match statement was indeed introduced in Python 3.10.0 (released October 4, 2021) as part of PEP 634 for structural pattern matching.

This means the code using match on line 46 will require Python 3.10+ as the minimum version, effectively dropping Python 3.9 support. Since you're fine with this requirement, I'd recommend documenting it in:

  1. The project's README.md or documentation
  2. pyproject.toml or setup.py with python_requires = ">=3.10"
  3. Any CI/CD configuration files

🏁 Script executed:

#!/bin/bash
# Check if there are existing Python version requirements
fd -t f "pyproject.toml|setup.py|requirements*.txt|.python-version" --exec cat {}

Length of output: 125


Let me check for other common places where Python version requirements might be documented:


🏁 Script executed:

#!/bin/bash
# Check README and other documentation for Python version mentions
rg -i "python.*[0-9]\.[0-9]" README.md docs/ --type md 2>/dev/null || echo "No version info in README/docs"

# Check GitHub Actions or CI configs for Python versions
fd -t f "\.yml$|\.yaml$" .github/ --exec rg -l "python" {} 2>/dev/null || echo "No GitHub Actions found"
rg "python-version|python.*[0-9]\.[0-9]" .github/ -A 2 -B 2 2>/dev/null || echo "No Python versions in CI"

# Check if there's a tox.ini or similar
fd -t f "tox.ini|.travis.yml|appveyor.yml" --exec cat {} 2>/dev/null || echo "No additional CI configs found"

Length of output: 3196


Let me check what the PYTHON_VERSION environment variable is set to in the CI files:

@reactive-firewall, I found some interesting information:

  1. Current Requirements: The README states CPython >= 3.13.0 is required
  2. CI Configuration: Most CI jobs use ${{ env.PYTHON_VERSION }} (need to check what this is set to)
  3. Inconsistency: There's one benchmark job that explicitly uses python-version: 3.9

The match statement requiring Python 3.10+ would be:

  • ✅ Compatible with the stated >= 3.13.0 requirement in README
  • ❌ Incompatible with the Python 3.9 benchmark job

This suggests the project might need to:

  1. Update the benchmark job to use Python 3.10+ or higher
  2. Ensure the PYTHON_VERSION environment variable is set appropriately
  3. Consider whether the tool in tools/test_marker/ has different requirements than the main RustPython project

🏁 Script executed:

#!/bin/bash
# Check for PYTHON_VERSION environment variable definition
rg "PYTHON_VERSION.*:" .github/ -A 2 -B 2
rg "env:" .github/ -A 5

Length of output: 7325


Perfect! Now I have the complete picture:

Current Project Requirements:

  • Official target: Python 3.13.1 (set as PYTHON_VERSION in CI)
  • README states: CPython >= 3.13.0
  • One inconsistency: Benchmark job still uses Python 3.9 explicitly

Impact of match statement (Python 3.10+ requirement):

  • Fully compatible with the project's Python 3.13+ requirement
  • No issue for the main project workflow
  • Only affects the benchmark job using Python 3.9

Recommendation:
Since your project already requires Python 3.13+, the match statement in tools/test_marker/main.py is perfectly fine and doesn't introduce any new restrictions. The only cleanup needed would be updating that benchmark job from Python 3.9 to match the project's minimum requirement.

So you're absolutely right about needing Python 3.10+ for match, but it's actually a non-issue given your project's existing 3.13+ requirement! The documentation is already clear in the README.


🧠 Learnings used
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only

Copy link
Collaborator Author

@ShaharNaveh ShaharNaveh Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.9 is going to EOL at 2025-10, I don't see a reason to keep compatibility with a version that is only receiving security updates ATM and will be deprecated in 2 months.

Source: https://devguide.python.org/versions/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

COMMENT = "TODO: RUSTPYTHON"

ROOT_DIR = pathlib.Path(__file__).parents[2]
CONFS = ROOT_DIR / "tools" / "test_marker" / "confs"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 NIT: Consider future use of os.path.sep (in-place of /) to keep this portable for cross-platform use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses pathlib.Path which is already cross platform:)

Comment on lines 86 to 87
if attr_node.value.id != "unittest":
return False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaharNaveh The suggestion is a good idea, and probably safer than just the name check above.

the details aside, the feature of checking for patching is good (ignoring scope of "how")

@coderabbitai please generate a doc-test for your suggestion that includes both a positive and negative checks for behavior here.

cls_node.end_lineno,
f"""
{INDENT1}{patch}
{INDENT1}def {fn_name}(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'd be interested to see if this can handle some of the crazier stuff like Lib/test/test_abc.py

The general approach seems fine to me (edge-cases aside).

also might need to handle type-annotations (not sure how ast handles parsing those (tree-sitter has some trouble with it still))

@coderabbitai what's your take on my line of thought here? did I miss something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@reactive-firewall, your analysis is spot-on! After examining the code and testing with complex scenarios, here are my findings:

Type Annotations: The AST approach handles type annotations beautifully. Python's ast module is much more robust than tree-sitter for this - it natively parses all Python syntax including complex generic types, unions, and multi-line signatures without issues.

Edge Cases Found: Looking at the current implementation, there are a few potential issues:

  1. Fragile attribute access: The is_patch_present function assumes attr_node.value.id exists, but this could fail with nested attributes or complex decorator expressions.

  2. Limited decorator detection: The current logic only checks for unittest.* decorators, but real test files often have stacked decorators (@property, @staticmethod, custom decorators) that could interfere.

  3. Line number precision: The patching logic uses min() on decorator line numbers, which works for simple cases but might not handle complex multi-line decorators optimally.

Regarding test_abc.py complexity: After examining it, the main complexity comes from:

  • Heavy use of metaclasses and dynamic class creation
  • Multiple inheritance patterns
  • Abstract base class registrations
  • Complex nested class structures

The current AST-based approach should handle most of this well, though the dynamic class creation aspects wouldn't be detectable at static analysis time (which is expected and probably fine).

Recommendations:

  • Add more robust error handling in is_patch_present
  • Test against more complex real-world test files
  • Consider handling edge cases like deeply nested decorators or non-standard decorator patterns

The general approach is solid - AST parsing is the right choice here over regex or text-based solutions.


🏁 Script executed:

#!/bin/bash
# First, let's look at the structure and complexity of test_abc.py
curl -s "https://raw.githubusercontent.com/python/cpython/main/Lib/test/test_abc.py" | head -50

Length of output: 1831


🏁 Script executed:

#!/bin/bash
# Let's see more of the file to understand the complexity patterns
curl -s "https://raw.githubusercontent.com/python/cpython/main/Lib/test/test_abc.py" | grep -E "(class|def |@)" | head -20

Length of output: 896


🏁 Script executed:

#!/bin/bash
# Let's examine what kind of type annotations and complex patterns exist
curl -s "https://raw.githubusercontent.com/python/cpython/main/Lib/test/test_abc.py" | grep -E "(->|:|typing|Union|Optional)" | head -15

Length of output: 784


🏁 Script executed:

#!/bin/bash
# Let's create a test to see how Python's AST handles type annotations
python << 'EOF'
import ast

# Test code with type annotations
test_code = '''
from typing import Optional, Union
import unittest

class TestClass(unittest.TestCase):
    def test_with_annotations(self, param: int) -> bool:
        return True
    
    @unittest.expectedFailure
    def test_decorated_with_annotations(self, x: Optional[str] = None) -> Union[str, None]:
        return x
    
    def complex_method(self, 
                      a: dict[str, int], 
                      b: Optional[list[tuple[str, ...]]]) -> None:
        pass
'''

# Parse and analyze
tree = ast.parse(test_code)

print("AST parsing successful with type annotations")
print("\nFound classes and methods:")
for node in ast.walk(tree):
    if isinstance(node, ast.ClassDef):
        print(f"Class: {node.name}")
        for method in node.body:
            if isinstance(method, ast.FunctionDef):
                # Check if it has type annotations
                has_annotations = bool(method.returns or any(arg.annotation for arg in method.args.args))
                decorators = [ast.unparse(dec) for dec in method.decorator_list]
                print(f"  Method: {method.name}, has_annotations: {has_annotations}, decorators: {decorators}")
EOF

Length of output: 340


🏁 Script executed:

#!/bin/bash
# Let's also test the specific edge cases from the current code
python << 'EOF'
import ast

# Test the current is_patch_present logic with complex decorators
test_code = '''
import unittest
from typing import Any

class ComplexTest(unittest.TestCase):
    @unittest.expectedFailure
    def simple_decorated(self): pass
    
    @unittest.skipIf(sys.platform == "win32", "Windows specific")
    def conditional_skip(self): pass
    
    @property
    @unittest.expectedFailure  
    def property_with_test_decorator(self): pass
    
    @unittest.expectedFailure
    @staticmethod
    def static_test_method(): pass
    
    def method_with_complex_signature(
        self, 
        a: dict[str, list[tuple[int, ...]]], 
        b: Optional[Union[str, bytes]] = None
    ) -> Generator[tuple[str, int], None, None]:
        pass
'''

tree = ast.parse(test_code)

# Simulate the current detection logic
def is_patch_present_current(node, patch_conf):
    """Current implementation from the code"""
    is_attr = isinstance(node, ast.Attribute)
    attr_node = node if is_attr else node.func
    
    if isinstance(attr_node, ast.Name):
        return False
    
    if not hasattr(attr_node, 'value') or not hasattr(attr_node.value, 'id'):
        return False
        
    if attr_node.value.id != "unittest":
        return False
    
    if is_attr:
        return node.attr in patch_conf
    
    return "RUSTPYTHON" in ast.unparse(node)

patch_conf = {"expectedFailure": {}}

print("Testing decorator detection:")
for node in ast.walk(tree):
    if isinstance(node, ast.FunctionDef):
        print(f"\nMethod: {node.name}")
        for dec in node.decorator_list:
            if isinstance(dec, (ast.Attribute, ast.Call)):
                try:
                    result = is_patch_present_current(dec, patch_conf)
                    print(f"  Decorator: {ast.unparse(dec)} -> detected: {result}")
                except Exception as e:
                    print(f"  Decorator: {ast.unparse(dec)} -> ERROR: {e}")
EOF

Length of output: 455


@reactive-firewall, excellent instincts! After running tests against the actual code, here's what I found:

Your concerns are well-founded:

1. Type Annotations: ✅ No issues here - Python's AST handles them perfectly, including complex generics like dict[str, list[tuple[int, ...]]]. This is where AST shines over tree-sitter.

2. Complex Edge Cases: ⚠️ Several issues discovered

From testing the current is_patch_present logic:

  • ✅ Correctly detects @unittest.expectedFailure
  • Fails to detect existing @unittest.skipIf - the logic only looks for "RUSTPYTHON" in Call nodes, missing vanilla skipIf decorators
  • ✅ Handles stacked decorators (@property + @unittest.expectedFailure) correctly
  • ✅ No crashes with complex signatures

3. test_abc.py Complexity: The real challenges are:

def test_factory(abc_ABCMeta, abc_get_cache_token):
    class TestLegacyAPI(unittest.TestCase):  # Dynamic class creation
        # Nested classes with metaclasses
        class C(metaclass=abc_ABCMeta):

Key Issues Found:

  1. Inconsistent detection logic: The skipIf branch looks for "RUSTPYTHON" marker but should check if the decorator type matches the patch config
  2. Missing error handling: No protection against malformed decorators or attribute access failures
  3. Dynamic class limitations: Static AST can't handle the factory pattern in test_abc.py (though this might be acceptable)

Suggested improvements:

# Fix the detection logic
if is_attr:
    return node.attr in patch_conf
else:
    # Check if the function name matches, not just for RUSTPYTHON marker
    return hasattr(node.func, 'attr') and node.func.attr in patch_conf

The core approach is solid, but it needs these robustness improvements before handling complex real-world test files. Good catch on anticipating these edge cases!


🧠 Learnings used
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tool (at it's current state) is aimed to handle 90%+ of the cases, if we still have a handful of files that needs to be manually updated I think it's still a win

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur, this tool (including its current state) it is totally a progress win! This was just nitpicking 🙊 about how to approach the future ~10% edge-cases.

This is not a show stopper.

lines = contents.splitlines()
tree = ast.parse(contents)

# Going in reverse to not distrupt the lines offset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 In the future it would be helpful to have logger.debug() output with before-after (plus some context lines) of the altered code. But assuming the list mutations are safe... that would be out-of-scope for a POC of-course.

@coderabbitai, please analysis this function (e.g., around lines 172-197) looking for possible improvements that could ensure any worries about list modifications here are mitigated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the complete apply_conf function implementation
ast-grep --pattern $'def apply_conf($$$) -> str:
    $$$'

Length of output: 1426


@reactive-firewall, I've analyzed the apply_conf function (lines 172-196) and identified several potential improvements to ensure list modification safety:

Current Implementation Analysis

The function currently:

  1. Splits content into lines
  2. Collects all patches into a list
  3. Processes patches in reverse order by line number
  4. Uses lines.insert(lineno, patch) to modify the list

Potential Safety Issues & Improvements

1. Input Validation

def apply_conf(contents: str, conf: dict) -> str:
    lines = contents.splitlines()
    tree = ast.parse(contents)
    
    patches = list(iter_patches(tree, conf))
    
    # Validate line numbers before processing
    max_line = len(lines)
    valid_patches = [
        (lineno, patch) for lineno, patch in patches 
        if 0 <= lineno <= max_line
    ]
    
    for lineno, patch in sorted(valid_patches, reverse=True):
        lines.insert(lineno, patch)
    
    return "\n".join(lines)

2. Handle Duplicate Line Numbers

# Group patches by line number to handle duplicates safely
from collections import defaultdict

patches_by_line = defaultdict(list)
for lineno, patch in patches:
    patches_by_line[lineno].append(patch)

# Process in reverse order, inserting multiple patches at same line
for lineno in sorted(patches_by_line.keys(), reverse=True):
    for patch in reversed(patches_by_line[lineno]):  # Maintain order
        if 0 <= lineno <= len(lines):
            lines.insert(lineno, patch)

3. Immutable Approach (Safer)

def apply_conf(contents: str, conf: dict) -> str:
    original_lines = contents.splitlines()
    tree = ast.parse(contents)
    
    patches = list(iter_patches(tree, conf))
    
    # Build insertion map
    insertions = defaultdict(list)
    for lineno, patch in patches:
        if 0 <= lineno <= len(original_lines):
            insertions[lineno].append(patch)
    
    # Build new lines list without mutating original
    result_lines = []
    for i, line in enumerate(original_lines):
        # Insert patches before current line
        result_lines.extend(insertions.get(i, []))
        result_lines.append(line)
    
    # Handle patches at end of file
    result_lines.extend(insertions.get(len(original_lines), []))
    
    return "\n".join(result_lines)

4. With Debug Logging (As You Suggested)

import logging

def apply_conf(contents: str, conf: dict) -> str:
    logger = logging.getLogger(__name__)
    
    lines = contents.splitlines()
    tree = ast.parse(contents)
    
    patches = list(iter_patches(tree, conf))
    logger.debug(f"Applying {len(patches)} patches")
    
    for lineno, patch in sorted(patches, reverse=True):
        if 0 <= lineno <= len(lines):
            # Log before/after context
            start_ctx = max(0, lineno - 2)
            end_ctx = min(len(lines), lineno + 3)
            
            logger.debug(f"Before patch at line {lineno}:")
            for i in range(start_ctx, end_ctx):
                marker = " > " if i == lineno else "   "
                logger.debug(f"{marker}{i:3d}: {lines[i] if i < len(lines) else '<EOF>'}")
            
            lines.insert(lineno, patch)
            
            logger.debug(f"After patch:")
            for i in range(start_ctx, min(len(lines), end_ctx + 1)):
                marker = " + " if i == lineno else "   "
                logger.debug(f"{marker}{i:3d}: {lines[i]}")
    
    return "\n".join(lines)

Recommendation

For the POC, the current implementation is reasonably safe, but I'd recommend option 3 (immutable approach) as it eliminates list mutation concerns entirely while being easy to understand and debug. The logging enhancement you mentioned would be valuable for troubleshooting patch applications.


🧠 Learnings used
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaharNaveh, Overall LGTM! With the caveat, that as a POC, it has lots of potential for improvements.

Most of my inline comments are off-the-cuff, I recognize this is a proof-of-concept and I like the concept, so feel-free to dismiss any comments that are off-topic without response.

@ShaharNaveh
Copy link
Collaborator Author

@youknowone can you please review this PR?

@ShaharNaveh ShaharNaveh changed the title Base version of automated test marker Base version of automated updates from cpython Aug 22, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/rpau/src/rpau/__init__.py (1)

49-76: Surface worker exceptions; don’t fire-and-forget ProcessPool tasks

Currently, failures in run() won’t be reported because the futures aren’t awaited. This can silently produce incomplete or stale outputs.

Apply this diff:

-    with ProcessPoolExecutor(args.workers) as executor:
-        for conf_file in iter_confs(
+    with ProcessPoolExecutor(args.workers) as executor:
+        futures = []
+        for conf_file in iter_confs(
             conf_dir, include=args.include, exclude=args.exclude
         ):
@@
-            executor.submit(func)
+            futures.append(executor.submit(func))
+
+        # Propagate errors from workers (and ensure completion)
+        for fut in futures:
+            fut.result()
🧹 Nitpick comments (26)
tools/rpau/README.md (1)

1-1: Prefer a standard H1 and add a one-line synopsis

The stylized bold letters look clever but make the title harder to scan and search. A plain H1 plus a short description improves readability and discoverability.

Apply this diff:

-# **R**ust**P**ython **A**utomatic **U**pdater (rpau)
+# RustPython Automatic Updater (rpau)
+
+Tooling to fetch upstream CPython sources and apply configuration-driven patches/markings for RustPython. See `rpau --help` for usage.
tools/rpau/confs/Lib/os.toml (1)

1-1: Optionally pin the CPython tag used for this import

Right now this relies on the CLI default (--default-version, currently v3.13.7). If the intention is to track a specific upstream for reproducibility, pin it here. The PR text mentions 3.13.6 for test updates; if Lib/os.py should align, consider pinning accordingly.

If you decide to pin:

 path = "Lib/os.py"
+version = "v3.13.6"  # or remove to use the CLI default
tools/rpau/confs/Lib/test/test_os.toml (6)

1-1: Pin upstream version to match PR description (3.13.6)

The CLI default is v3.13.7, but the PR description says test_os.py was updated from 3.13.6. Pinning here avoids accidental drift and makes the generated file header deterministic.

Apply this diff:

+version = "v3.13.6"
 path = "Lib/test/test_os.py"

3-3: Remove trailing whitespace in section header

Minor style cleanup; avoids noisy diffs if tooling trims whitespace later.

Apply this diff:

-[FileTests] 
+[FileTests]

8-9: Tighten “reason” messages (fix stray parentheses, clarify wording)

The current texts look like partial exception messages. Cleaning them up makes the intent clearer without changing behavior.

Apply this diff:

-test_access_denied = { expectedFailureIfWindows = { reason = "os.stat (PermissionError: [Errno 5] Access is denied" } }
-test_stat_block_device = { expectedFailureIfWindows = { reason = "os.stat (PermissionError: [Errno 1] Incorrect function" } }
+test_access_denied = { expectedFailureIfWindows = { reason = "PermissionError: [Errno 5] Access is denied by os.stat" } }
+test_stat_block_device = { expectedFailureIfWindows = { reason = "PermissionError: [Errno 1] Incorrect function (os.stat on Windows)" } }

13-13: Fix unmatched parenthesis in reason

Cosmetic, but avoids confusion.

Apply this diff:

-test_utime_directory = { expectedFailureIfWindows = { reason = "(AssertionError: 2.002003 != 1.002003 within 1e-06 delta (1.0000000000000002 difference)" } }
+test_utime_directory = { expectedFailureIfWindows = { reason = "AssertionError: 2.002003 != 1.002003 within 1e-06 delta (1.0000000000000002 difference)" } }

33-34: Reason strings seem incorrect (“No module named 'os'”)

These tests fail for feature gaps, not because os is missing. Update the reasons to reflect the real limitation so the markers remain meaningful.

Apply this diff:

-test_urandom_fd_closed = { expectedFailureIfWindows = { reason = "ModuleNotFoundError: No module named 'os'" } }
-test_urandom_fd_reopened = { expectedFailureIfWindows = { reason = "ModuleNotFoundError: No module named 'os'" } }
+test_urandom_fd_closed = { expectedFailureIfWindows = { reason = "os.urandom FD-based behavior not supported on Windows in RustPython yet" } }
+test_urandom_fd_reopened = { expectedFailureIfWindows = { reason = "os.urandom FD-based behavior not supported on Windows in RustPython yet" } }

78-80: Windows-only duplication: prefer expectedFailureIfWindows or skipIf consistently

dup tests are explicitly skipped on Windows while related inheritable tests are marked xfail-if-windows. Consider making these consistent to reduce cognitive load (e.g., all Windows-specific limitations use expectedFailureIfWindows unless the test is fundamentally inapplicable).

Would you like me to normalize these choices across all Windows cases in this file?

tools/rpau/src/rpau/logger.py (2)

7-14: Accept both int and str for level (matches argparse input)

build_root_logger is called with args.log_level, which is a string (e.g., "INFO"). Relax the type to int | str to reflect actual usage.

Apply this diff:

-def build_root_logger(
-    name: str = _NAME, level: int = logging.WARNING
+def build_root_logger(
+    name: str = _NAME, level: int | str = logging.WARNING
 ) -> logging.Logger:

16-21: Prevent double logging by stopping propagation

If the root logger also has handlers (common in CI or when embedded), child logs will be duplicated. Set propagate = False on the rpau logger after attaching the stream handler.

Apply this diff:

     sh = logging.StreamHandler(sys.stdout)
     sh.setFormatter(formatter)
     logger.handlers.clear()
     logger.addHandler(sh)
+    logger.propagate = False
tools/rpau/src/rpau/__init__.py (4)

20-21: Use simpler/idiomatic pattern for rglob

Path.rglob('*.toml') already searches recursively. Using '**/*.toml' is redundant.

Apply this diff:

-    for conf_file in conf_dir.rglob("**/*.toml"):
+    for conf_file in conf_dir.rglob("*.toml"):

24-29: Match include/exclude against OS paths and use search() instead of match()

  • as_uri().removeprefix("file://") produces odd leading slashes and is platform-sensitive.
  • re.match anchors at the start; search is friendlier for substring patterns.

Apply this diff:

-        uri = conf_file.as_uri().removeprefix("file://")
-        if not include.match(uri):
+        uri = str(conf_file.resolve())
+        if not include.search(uri):
             continue

-        if exclude.match(uri):
+        if exclude.search(uri):
             continue

56-57: Use logger.warning() (logger.warn is deprecated)

Modern logging deprecates warn. Use warning for compatibility.

Apply this diff:

-                logger.warn(f"{conf_file}: {err}")
+                logger.warning(f"{conf_file}: {err}")

62-63: Use logger.warning() (logger.warn is deprecated)

Same as above.

Apply this diff:

-                logger.warn(f"{conf_file}: has no 'path' key. skipping")
+                logger.warning(f"{conf_file}: has no 'path' key. skipping")
tools/rpau/src/rpau/cli.py (4)

10-17: Avoid leaking env var values in help output

Printing the current value of each environment variable in --help can leak sensitive data and creates noisy help text. Showing the env var name is enough.

-            if action.dest != "help":
-                env_name = f"{self._prog}_{action.dest}".upper()
-                env_value = os.environ.get(env_name, "")
-                help_msg += f" [env: {env_name}={env_value}]"
+            if action.dest != "help":
+                env_name = f"{self._prog}_{action.dest}".upper()
+                suffix = f" [env: {env_name}{' set' if env_name in os.environ else ' unset'}]"
+                help_msg += suffix

69-70: Pluralize the group title

Minor UX nit: make it “Output options” for consistency with other groups.

-    output_group = parser.add_argument_group("Output option")
+    output_group = parser.add_argument_group("Output options")

74-78: Default output dir depends on this file’s location; consider a safer baseline

Using Path(__file__).parents[4] assumes the repo layout and breaks if the module moves. Safer defaults are Path.cwd() or the rpau project root (e.g., Path(__file__).parents[2]). If you keep this, please document the assumption in the help text.


122-126: Double-check the raw GitHub base URL

Raw content URLs generally use https://raw.githubusercontent.com/<org>/<repo>/<tag-or-branch>/<path>. The refs/tags segment is typically not part of raw URLs and may 404 when joined as .../refs/tags/<version>/<path>. Consider changing the default to https://raw.githubusercontent.com/python/cpython.

Proposed tweak:

-        default="https://raw.githubusercontent.com/python/cpython/refs/tags",
+        default="https://raw.githubusercontent.com/python/cpython",

And ensure fetch_upstream() joins as /{version}/{path}.

tools/rpau/src/rpau/logic.py (8)

20-27: Harden network fetches (user agent, robust errors)

urllib.request.urlopen without a User-Agent can be throttled by some endpoints, and network failures are unhandled. Consider adding a UA and catching URLError/HTTPError with a clear message.

-    with urllib.request.urlopen(upstream_url) as f:
-        contents = f.read().decode()
+    req = urllib.request.Request(
+        upstream_url,
+        headers={"User-Agent": "rpau/0.1 (+https://github.com/RustPython/RustPython)"},
+    )
+    try:
+        with urllib.request.urlopen(req) as f:
+            contents = f.read().decode()
+    except Exception as exc:  # URLError, HTTPError
+        raise RuntimeError(f"Failed to fetch {upstream_url}: {exc}") from exc

120-127: Replace stray triple-quoted “comment” with real comment

This block is a no-op string literal inside a function and can confuse readers and tools. Convert to a proper comment (or remove).

-                '''
-                if any(
-                    is_patch_present(dec_node, patch_conf)
-                    for dec_node in fn_node.decorator_list
-                    if isinstance(dec_node, (ast.Attribute, ast.Call))
-                ):
-                    continue
-                '''
+                # TODO: If/when we support idempotent patching, detect existing
+                # decorators before inserting to avoid duplication.

134-137: Guard against unknown patch kinds

If format_patch() returns an empty string (unknown method), we’ll insert a blank line. Consider asserting or skipping unknown patch kinds to avoid silent no-ops.

-                patch = format_patch(patch_conf)
-                yield (lineno - 1, f"{indent}{patch}")
+                patch = format_patch(patch_conf)
+                if not patch:
+                    raise ValueError(f"Unknown patch spec: {patch_conf!r}")
+                yield (lineno - 1, f"{indent}{patch}")

138-160: Method injection indentation assumes top-level classes

Using fixed INDENT1/INDENT2 (4 spaces) will misalign for nested classes or atypical indentation. You can compute indentation from cls_node.col_offset to keep formatting consistent.

For example:

-                yield (
-                    cls_node.end_lineno,
-                    f"""
-{INDENT1}{patch}
-{INDENT1}def {fn_name}(self):
-{INDENT2}return super().{fn_name}()
-""".rstrip(),
-                )
+                indent_cls = " " * (cls_node.col_offset + COL_OFFSET)
+                indent_body = indent_cls + (" " * COL_OFFSET)
+                body = (
+                    f"\n{indent_cls}{patch}\n"
+                    f"{indent_cls}def {fn_name}(self):\n"
+                    f"{indent_body}return super().{fn_name}()"
+                )
+                yield (cls_node.end_lineno, body)

161-171: Align type hints: conf should be Conf

The annotation says dict while the module defines Conf. Keep it consistent.

-def apply_conf(contents: str, conf: dict) -> str:
+def apply_conf(contents: str, conf: Conf) -> str:

180-186: Minor: spelling and clarity in comment

“distrupt” → “disrupt”, and the comment can be clearer.

-    # Going in reverse to not distrupt the line offset
+    # Insert in reverse order to avoid disrupting subsequent line offsets

199-205: Style and encoding: fix spacing and write with UTF-8

PEP 8 spacing around =, and it’s safer to specify encoding when writing files.

-    patched_contents=apply_conf(contents, conf)
+    patched_contents = apply_conf(contents, conf)
@@
-    output_file.write_text(f"{new_contents}\n")
+    output_file.write_text(f"{new_contents}\n", encoding="utf-8")

21-27: Verify upstream URL composition

If the CLI keeps using .../refs/tags in --base-upstream-url, fetch_upstream() will build URLs like .../refs/tags/<version>/<path>. Raw GitHub typically expects .../<version>/<path> without refs/tags. Please confirm and adjust either the default base URL or the join logic here.

Also applies to: 122-126

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9622114 and 302b3d1.

⛔ Files ignored due to path filters (12)
  • Lib/argparse.py is excluded by !Lib/**
  • Lib/ast.py is excluded by !Lib/**
  • Lib/decimal.py is excluded by !Lib/**
  • Lib/heapq.py is excluded by !Lib/**
  • Lib/ipaddress.py is excluded by !Lib/**
  • Lib/numbers.py is excluded by !Lib/**
  • Lib/os.py is excluded by !Lib/**
  • Lib/pprint.py is excluded by !Lib/**
  • Lib/queue.py is excluded by !Lib/**
  • Lib/test/test_os.py is excluded by !Lib/**
  • Lib/this.py is excluded by !Lib/**
  • tools/rpau/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • tools/rpau/README.md (1 hunks)
  • tools/rpau/confs/Lib/argparse.toml (1 hunks)
  • tools/rpau/confs/Lib/ast.toml (1 hunks)
  • tools/rpau/confs/Lib/decimal.toml (1 hunks)
  • tools/rpau/confs/Lib/heapq.toml (1 hunks)
  • tools/rpau/confs/Lib/ipaddress.toml (1 hunks)
  • tools/rpau/confs/Lib/numbers.toml (1 hunks)
  • tools/rpau/confs/Lib/os.toml (1 hunks)
  • tools/rpau/confs/Lib/pprint.toml (1 hunks)
  • tools/rpau/confs/Lib/queue.toml (1 hunks)
  • tools/rpau/confs/Lib/test/test_os.toml (1 hunks)
  • tools/rpau/confs/Lib/this.toml (1 hunks)
  • tools/rpau/pyproject.toml (1 hunks)
  • tools/rpau/src/rpau/__init__.py (1 hunks)
  • tools/rpau/src/rpau/cli.py (1 hunks)
  • tools/rpau/src/rpau/logger.py (1 hunks)
  • tools/rpau/src/rpau/logic.py (1 hunks)
✅ Files skipped from review due to trivial changes (10)
  • tools/rpau/confs/Lib/ast.toml
  • tools/rpau/confs/Lib/pprint.toml
  • tools/rpau/confs/Lib/numbers.toml
  • tools/rpau/confs/Lib/decimal.toml
  • tools/rpau/confs/Lib/ipaddress.toml
  • tools/rpau/confs/Lib/argparse.toml
  • tools/rpau/pyproject.toml
  • tools/rpau/confs/Lib/this.toml
  • tools/rpau/confs/Lib/queue.toml
  • tools/rpau/confs/Lib/heapq.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • tools/rpau/src/rpau/logger.py
  • tools/rpau/src/rpau/logic.py
  • tools/rpau/src/rpau/__init__.py
  • tools/rpau/src/rpau/cli.py
🧬 Code graph analysis (2)
tools/rpau/src/rpau/logic.py (1)
tools/rpau/src/rpau/logger.py (1)
  • get_logger (24-25)
tools/rpau/src/rpau/__init__.py (3)
tools/rpau/src/rpau/cli.py (1)
  • build_argparse (46-134)
tools/rpau/src/rpau/logger.py (2)
  • build_root_logger (7-21)
  • get_logger (24-25)
tools/rpau/src/rpau/logic.py (1)
  • run (187-205)
🪛 Flake8 (7.2.0)
tools/rpau/src/rpau/logic.py

[error] 15-15: SyntaxError: invalid syntax

(E999)

tools/rpau/src/rpau/__init__.py

[error] 8-8: 'rpau.logger.get_logger' imported but unused

(F401)

🪛 LanguageTool
tools/rpau/README.md

[grammar] ~1-~1: Hier könnte ein Fehler sein.
Context: ...**ython Automatic Updater (rpau)

(QB_NEW_DE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
🔇 Additional comments (1)
tools/rpau/src/rpau/cli.py (1)

129-132: Confirm the default CPython version

Defaulting to v3.13.7 is fine if that’s the intended baseline. Given the repo also mentions 3.13.6 in the PR description, please confirm which tag we want as default for now.

Comment on lines 7 to 9
from rpau.cli import build_argparse
from rpau.logger import build_root_logger, get_logger
from rpau.logic import run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import to satisfy ruff/flake8 (F401)

get_logger is imported but not used.

Apply this diff:

-from rpau.logger import build_root_logger, get_logger
+from rpau.logger import build_root_logger
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from rpau.cli import build_argparse
from rpau.logger import build_root_logger, get_logger
from rpau.logic import run
from rpau.cli import build_argparse
from rpau.logger import build_root_logger
from rpau.logic import run
🧰 Tools
🪛 Flake8 (7.2.0)

[error] 8-8: 'rpau.logger.get_logger' imported but unused

(F401)

🤖 Prompt for AI Agents
In tools/rpau/src/rpau/__init__.py around lines 7 to 9, the get_logger symbol is
imported but never used which triggers an F401 unused-import lint error; remove
the unused import by deleting get_logger from the import list (leave
build_argparse, build_root_logger, run intact) so the file only imports the
symbols that are actually used.

Comment on lines 22 to 26
def _add_action(self, action):
action.default = os.environ.get(
f"{self.prog}_{action.dest}".upper(), action.default
)
return super()._add_action(action)
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Env overrides ignore types (Path/int/bool), causing subtle bugs

Overriding defaults from env injects raw strings for every action. This breaks typed options like --cache (BooleanOptionalAction), --cache-dir (Path), --workers (int), and regex patterns (type=re.compile). Also, invalid values for --log-level won’t be validated. Coerce env strings using the action’s type (or a bool parser for BooleanOptionalAction) and validate against choices.

-    def _add_action(self, action):
-        action.default = os.environ.get(
-            f"{self.prog}_{action.dest}".upper(), action.default
-        )
-        return super()._add_action(action)
+    def _add_action(self, action):
+        env_name = f"{self.prog}_{action.dest}".upper()
+        env_val = os.environ.get(env_name, None)
+        if env_val is not None:
+            # Coerce env string to the expected type
+            if isinstance(action, argparse.BooleanOptionalAction):
+                val = env_val.lower() in {"1", "true", "yes", "on"}
+            elif action.type is not None:
+                val = action.type(env_val)
+            else:
+                val = env_val
+            # Validate against choices if provided
+            if action.choices and val not in action.choices:
+                raise argparse.ArgumentError(
+                    action, f"{env_name} must be one of {list(action.choices)}"
+                )
+            action.default = val
+        return super()._add_action(action)

If you prefer not to use isinstance(..., argparse.BooleanOptionalAction), I can inline a tiny _strtobool() helper and call it whenever action.option_strings contains both the --foo and --no-foo forms.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _add_action(self, action):
action.default = os.environ.get(
f"{self.prog}_{action.dest}".upper(), action.default
)
return super()._add_action(action)
def _add_action(self, action):
env_name = f"{self.prog}_{action.dest}".upper()
env_val = os.environ.get(env_name, None)
if env_val is not None:
# Coerce env string to the expected type
if isinstance(action, argparse.BooleanOptionalAction):
val = env_val.lower() in {"1", "true", "yes", "on"}
elif action.type is not None:
val = action.type(env_val)
else:
val = env_val
# Validate against choices if provided
if action.choices and val not in action.choices:
raise argparse.ArgumentError(
action, f"{env_name} must be one of {list(action.choices)}"
)
action.default = val
return super()._add_action(action)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 NIT - I agree with @coderabbitai on this, if the code is going to implement a custom type-aware arg-parser anyway, might as well be consistent with the types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reactive-firewall Thank you for the agreement! You're absolutely right - since we're already implementing a custom argument parser with environment variable support, maintaining type consistency is crucial. The current implementation would cause silent bugs where boolean flags become strings, integer workers become strings, and Path objects become strings, which could lead to confusing runtime errors downstream. The type-aware coercion ensures the environment variables work exactly as if they were passed via command line arguments.

Comment on lines 15 to 16
type Patch = dict[str, dict[str, str]]
type Conf = dict[str, Patch]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

PEP 695 type aliases break on Python <3.12 (Flake8 E999)

type Patch = ... and type Conf = ... require Python 3.12+. The CI error (E999) indicates the linters/interpreter parsing this file don’t support it. Use traditional aliases to restore compatibility.

-type Patch = dict[str, dict[str, str]]
-type Conf = dict[str, Patch]
+Patch = dict[str, dict[str, str]]
+Conf = dict[str, Patch]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type Patch = dict[str, dict[str, str]]
type Conf = dict[str, Patch]
Patch = dict[str, dict[str, str]]
Conf = dict[str, Patch]
🧰 Tools
🪛 Flake8 (7.2.0)

[error] 15-15: SyntaxError: invalid syntax

(E999)

🤖 Prompt for AI Agents
In tools/rpau/src/rpau/logic.py around lines 15-16, the new `type` alias syntax
requires Python 3.12 and causes Flake8 E999; replace the PEP 695 declarations
with traditional assignment aliases, e.g. assign Patch to the dict[...] type and
Conf to the dict[...] type instead of using `type Patch = ...` and `type Conf =
...`, so the file parses on older Python versions and CI linters.

Comment on lines 47 to 81
def format_patch(patch_conf: Patch) -> str:
"""
Transforms a patch definition to a raw python code.
Parameters
----------
patch_conf : Patch
Conf of the patch.
Returns
-------
str
Raw python source code.
Examples
--------
>>> patch = {"expectedFailure": {"reason": "lorem ipsum"}}
>>> format_patch(patch)
'@unittest.expectedFailure # TODO: RUSTPYTHON; lorem ipsum'
"""
method, conf = next(iter(patch_conf.items()))
prefix = f"@unittest.{method}"

reason = conf.get("reason", "")
res = ""
match method:
case "expectedFailure":
res = f"{prefix} # {COMMENT}; {reason}"
case "expectedFailureIfWindows" | "skip":
res = f'{prefix}("{COMMENT}; {reason}")'
case "skipIf":
cond = conf["cond"]
res = f'{prefix}({cond}, "{COMMENT}; {reason}")'

return res.strip().rstrip(";").strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

expectedFailureIfWindows is not a unittest decorator; emitted code will error

unittest exposes expectedFailure, skip, skipIf, skipUnless. There’s no expectedFailureIfWindows on unittest. In CPython tests, the idiom is test.support.anticipate_failure(condition) to conditionally apply expectedFailure. Emitting @unittest.expectedFailureIfWindows("...") will raise AttributeError at import time.

Suggested mapping change and inline comment:

-        case "expectedFailure":
-            res = f"{prefix} # {COMMENT}; {reason}"
-        case "expectedFailureIfWindows" | "skip":
-            res = f'{prefix}("{COMMENT}; {reason}")'
+        case "expectedFailure":
+            res = f"{prefix} # {COMMENT}; {reason}"
+        case "skip":
+            res = f'{prefix}("{COMMENT}; {reason}")'
+        case "expectedFailureIfWindows":
+            # Use CPython's helper to conditionally apply expectedFailure on Windows
+            res = f'@anticipate_failure(sys.platform == "win32")  # {COMMENT}; {reason}'

To support this, ensure the target file has:

  • from test.support import anticipate_failure
  • import sys

Since import injection isn’t implemented yet, either:

  • Extend apply_conf() to add missing imports at the file top (I can draft this), or
  • Downgrade to @unittest.skipIf(sys.platform == "win32", "...") as a temporary measure to avoid import-time failures.

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
tools/rpau/src/rpau/logic.py (2)

15-16: Fix PEP 695 type aliases (breaks CI on Python <3.12)

type Patch = ... and type Conf = ... are Python 3.12+ only and trigger Flake8 E999. Replace with traditional aliases.

-type Patch = dict[str, dict[str, str]]
-type Conf = dict[str, Patch]
+Patch = dict[str, dict[str, str]]
+Conf = dict[str, Patch]

75-83: Invalid decorator: unittest.expectedFailureIfWindows does not exist

This will raise AttributeError at import time. Either use CPython’s test.support.anticipate_failure or temporarily downgrade to a skipIf with a clear TODO. I recommend implementing the proper mapping and adding import injection for anticipate_failure.

-    match method:
-        case "expectedFailure":
-            res = f"{prefix} # {COMMENT}; {reason}"
-        case "expectedFailureIfWindows" | "skip":
-            res = f'{prefix}("{COMMENT}; {reason}")'
-        case "skipIf":
-            cond = conf["cond"]
-            res = f'{prefix}({cond}, "{COMMENT}; {reason}")'
+    match method:
+        case "expectedFailure":
+            res = f"{prefix} # {COMMENT}; {reason}"
+        case "skip":
+            res = f'{prefix}("{COMMENT}; {reason}")'
+        case "expectedFailureIfWindows":
+            # apply conditionally on Windows like CPython tests do
+            res = f'@anticipate_failure(sys.platform == "win32")  # {COMMENT}; {reason}'
+        case "skipIf":
+            cond = conf.get("cond")
+            if not cond:
+                raise ValueError("skipIf patch requires a 'cond' entry")
+            res = f'{prefix}({cond}, "{COMMENT}; {reason}")'
+        case _:
+            raise ValueError(f"Unsupported patch method: {method}")

Follow-up required: ensure the target file has from test.support import anticipate_failure and import sys (see import injection suggestion below).

🧹 Nitpick comments (10)
tools/rpau/src/rpau/logic.py (10)

104-141: Simplify AST traversal and avoid mutating conf in-place

  • Traversing with ast.iter_fields(tree) is unnecessary; iterate tree.body and filter ast.ClassDef for readability.
  • cls_conf.pop(...) mutates the caller’s conf; that’s surprising and can cause bugs if the same conf is reused. Work on a deep copy.

Sketch:

-    for key, nodes in ast.iter_fields(tree):
-        if key != "body":
-            continue
-        for i, cls_node in enumerate(nodes):
+    for cls_node in (n for n in tree.body if isinstance(n, ast.ClassDef)):
         ...
-                if not (patch_conf := cls_conf.pop(fn_node.name, None)):
+                if not (patch_conf := cls_conf.get(fn_node.name)):
                     continue

And clone the input before calling iter_patches (see apply_conf comment).

Also applies to: 142-163


21-27: Add timeout and error handling to upstream fetch

Network calls should use a timeout and surface clear errors. Also, set a UA to avoid being blocked.

-def fetch_upstream(*, base_url: str, path: str, version: str) -> str:
-    upstream_url = "/".join((base_url, version, path))
-    logger.debug(f"{upstream_url=}")
-
-    with urllib.request.urlopen(upstream_url) as f:
-        contents = f.read().decode()
-    return contents
+def fetch_upstream(*, base_url: str, path: str, version: str, timeout: float = 15.0) -> str:
+    from urllib.error import HTTPError, URLError
+    upstream_url = "/".join((base_url.rstrip("/"), version.strip("/"), path.lstrip("/")))
+    logger.debug("Fetching %s", upstream_url)
+    req = urllib.request.Request(upstream_url, headers={"User-Agent": "rpau/0.1 (+RustPython)"})
+    try:
+        with urllib.request.urlopen(req, timeout=timeout) as f:
+            contents = f.read().decode("utf-8")
+            return contents
+    except (HTTPError, URLError) as e:
+        raise RuntimeError(f"Failed to fetch upstream {upstream_url}: {e}") from e

No call-site changes required because timeout has a default.


30-47: Cache: pre-create parent dirs and handle read errors explicitly

Minor: creating parent dirs before attempting read_text() avoids a redundant try/except on deeply nested paths and makes the logic a bit clearer.

-    if cache_dir:
-        cached_file = cache_dir / version / path
-        try:
-            contents = cached_file.read_text()
-        except FileNotFoundError:
-            cached_file.parent.mkdir(parents=True, exist_ok=True)
-            contents = fetch()
-            cached_file.write_text(contents)
-
-        return contents
+    if cache_dir:
+        cached_file = cache_dir / version / path
+        cached_file.parent.mkdir(parents=True, exist_ok=True)
+        if cached_file.exists():
+            return cached_file.read_text()
+        contents = fetch()
+        cached_file.write_text(contents)
+        return contents

10-13: Remove fixed indent constants (become dead code after wrapper fix)

Once wrapper indentation is dynamic, INDENT1/INDENT2 are unused and can be dropped.

 COL_OFFSET = 4
-INDENT1 = " " * COL_OFFSET
-INDENT2 = INDENT1 * 2
 COMMENT = "TODO: RUSTPYTHON"

49-68: Guardrail: validate single-key patch objects early

format_patch assumes a single key. Add a quick validation to fail fast on malformed config.

-    method, conf = next(iter(patch_conf.items()))
+    if len(patch_conf) != 1:
+        raise ValueError(f"Patch must have exactly one key, got {list(patch_conf)}")
+    method, conf = next(iter(patch_conf.items()))

183-186: Comment typo

Nit: “distrupt” → “disrupt”.

-    # Going in reverse to not distrupt the line offset
+    # Going in reverse to not disrupt the line offset

204-211: Enrich the header with provenance and generation notice

Add a clear “generated” banner and the exact upstream URL to ease troubleshooting.

-    patched_contents = apply_conf(contents, conf)
-    new_contents = f"# upstream_version: {version}\n{patched_contents}"
+    patched_contents = apply_conf(contents, conf)
+    upstream_url = "/".join((base_upstream_url.rstrip("/"), version.strip("/"), path.lstrip("/")))
+    new_contents = (
+        "# This file is generated by rpau. DO NOT EDIT.\n"
+        f"# upstream_version: {version}\n"
+        f"# upstream_source: {upstream_url}\n"
+        f"{patched_contents}"
+    )

Also consider preserving permissions with shutil.copymode when overwriting existing files (left as TODO if you prefer).


192-203: Error handling: surface fetch/apply failures with context

Wrap the orchestration path with minimal try/except to log actionable context (path/version) before bubbling up.

 def run(
@@
-    contents = get_upstream_contents(
-        path=path, version=version, base_url=base_upstream_url, cache_dir=cache_dir
-    )
+    try:
+        contents = get_upstream_contents(
+            path=path, version=version, base_url=base_upstream_url, cache_dir=cache_dir
+        )
+    except Exception as e:
+        logger.error("Failed to retrieve upstream for %s@%s: %s", path, version, e)
+        raise
@@
-    patched_contents = apply_conf(contents, conf)
+    try:
+        patched_contents = apply_conf(contents, conf)
+    except Exception as e:
+        logger.error("Failed to apply conf to %s: %s", path, e)
+        raise

1-9: Typing: avoid runtime evaluation of forward references on older Pythons

Optional: add from __future__ import annotations to postpone evaluation of annotations on 3.10/3.11, reducing chances of NameErrors in annotations and improving import-time robustness.

+from __future__ import annotations
 import ast
 import urllib.request
 from typing import TYPE_CHECKING

165-189: Add unit tests for patch formatting and insertion

This module is central to the auto-updater. Recommend unit tests for:

  • format_patch for all methods, including invalid input
  • apply_conf end-to-end on a small sample file (with and without imports/docstring)
  • Windows-conditional expectedFailure decorator insertion

I can draft pytest cases with golden files for quick verification. Want me to push a test scaffold?

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 302b3d1 and 11694e3.

⛔ Files ignored due to path filters (1)
  • Lib/textwrap.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • tools/rpau/confs/Lib/textwrap.toml (1 hunks)
  • tools/rpau/src/rpau/logic.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tools/rpau/confs/Lib/textwrap.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • tools/rpau/src/rpau/logic.py
🧠 Learnings (3)
📚 Learning: 2025-07-19T03:16:56.511Z
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only

Applied to files:

  • tools/rpau/src/rpau/logic.py
📚 Learning: 2025-07-19T03:16:56.511Z
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When a test cannot pass due to missing language features, keep it as `unittest.expectedFailure` and document the reason

Applied to files:

  • tools/rpau/src/rpau/logic.py
📚 Learning: 2025-07-19T03:16:56.511Z
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : The only acceptable modifications to test files are: 1. Removing `unittest.expectedFailure` decorators and the upper TODO comments when tests actually pass 2. Adding `unittest.expectedFailure` decorators when tests cannot be fixed

Applied to files:

  • tools/rpau/src/rpau/logic.py
🧬 Code graph analysis (1)
tools/rpau/src/rpau/logic.py (1)
tools/rpau/src/rpau/logger.py (1)
  • get_logger (24-25)
🪛 Flake8 (7.2.0)
tools/rpau/src/rpau/logic.py

[error] 15-15: SyntaxError: invalid syntax

(E999)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (windows-latest)
🔇 Additional comments (1)
tools/rpau/src/rpau/logic.py (1)

49-84: Align with project guidance: prefer expectedFailure for missing features

Per the retrieved learnings for Lib/test: when a test cannot pass due to missing language features (e.g., PEP 695), keep it as unittest.expectedFailure and document the reason. Your current mapping supports this; ensure TOML reasons reflect the missing feature explicitly so the generated comment is actionable.

If you want, I can help audit TOML entries to ensure reasons cite the relevant PEPs or issues.

Comment on lines 15 to 16
type Patch = dict[str, dict[str, str]]
type Conf = dict[str, Patch]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Run ruff/flake8 locally to confirm fixes

After replacing PEP 695 aliases, verify parseability and style.


🏁 Script executed:

#!/bin/bash
# From repo root
python -m compileall -q tools/rpau/src/rpau/logic.py
ruff check tools/rpau/src/rpau/logic.py

Length of output: 769


Fix invalid type alias syntax and add missing import

The current alias declarations and type annotations cause a compile error and undefined name:

  • The lines 15–16

    type Patch = dict[str, dict[str, str]]
    type Conf = dict[str, Patch]

    use unsupported type … = … alias syntax and trigger a SyntaxError in Python 3.11 (and earlier) environments. Replace them with PEP 613’s TypeAlias or a standard assignment pattern. For example:

     from typing import TypeAlias
     
    -type Patch = dict[str, dict[str, str]]
    -type Conf = dict[str, Patch]
    +Patch: TypeAlias = dict[str, dict[str, str]]
    +Conf:  TypeAlias = dict[str, Patch]
  • At line 86, the return annotation -> "Iterator[tuple[int, str]]" refers to Iterator without importing it, leading to:

    F821 Undefined name `Iterator`
    

    Add an import at the top of the file:

     from typing import TypeAlias
    +from typing import Iterator

Once these changes are applied, please re-run the checks:

python -m compileall -q tools/rpau/src/rpau/logic.py
ruff check tools/rpau/src/rpau/logic.py

and verify that the file parses cleanly and passes all style checks.

🧰 Tools
🪛 Flake8 (7.2.0)

[error] 15-15: SyntaxError: invalid syntax

(E999)

🤖 Prompt for AI Agents
In tools/rpau/src/rpau/logic.py around lines 15–16 and line 86, replace the
invalid alias syntax "type Patch = ..." and "type Conf = ..." with a valid
TypeAlias declaration or simple assignments (e.g. import TypeAlias from typing
and write "Patch: TypeAlias = dict[str, dict[str, str]]" and "Conf: TypeAlias =
dict[str, Patch]" or use "Patch = dict[str, dict[str, str]]" / "Conf = dict[str,
Patch]" as plain assignments), and add the missing import for Iterator (e.g.
"from typing import Iterator") at the top of the file so the return annotation
on line 86 resolves; after making these edits rerun "python -m compileall -q
tools/rpau/src/rpau/logic.py" and "ruff check tools/rpau/src/rpau/logic.py" to
verify parsing and style checks pass.

Comment on lines 141 to 162
# Phase 2: Iterate and mark inhereted tests
for key, nodes in ast.iter_fields(tree):
if key != "body":
continue

for i, cls_node in enumerate(nodes):
if not isinstance(cls_node, ast.ClassDef):
continue

if not (cls_conf := conf.get(cls_node.name)):
continue

for fn_name, patch_conf in cls_conf.items():
patch = format_patch(patch_conf)
yield (
cls_node.end_lineno,
f"""
{INDENT1}{patch}
{INDENT1}def {fn_name}(self):
{INDENT2}return super().{fn_name}()
""".rstrip(),
)
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Wrong indentation for injected wrapper; use class col_offset, not a fixed 4 spaces

Using a fixed INDENT1/INDENT2 breaks formatting for nested or differently indented classes. Compute indentation from cls_node.col_offset.

-            for fn_name, patch_conf in cls_conf.items():
-                patch = format_patch(patch_conf)
-                yield (
-                    cls_node.end_lineno,
-                    f"""
-{INDENT1}{patch}
-{INDENT1}def {fn_name}(self):
-{INDENT2}return super().{fn_name}()
-""".rstrip(),
-                )
+            for fn_name, patch_conf in cls_conf.items():
+                patch = format_patch(patch_conf)
+                cls_indent = " " * cls_node.col_offset
+                body_indent = cls_indent + (" " * COL_OFFSET)
+                yield (
+                    cls_node.end_lineno,
+                    (
+                        f"\n"
+                        f"{cls_indent}{patch}\n"
+                        f"{cls_indent}def {fn_name}(self):\n"
+                        f"{body_indent}return super().{fn_name}()"
+                    ),
+                )

Additionally, after this change, INDENT1/INDENT2 become dead code and can be removed. See a cleanup note below.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Phase 2: Iterate and mark inhereted tests
for key, nodes in ast.iter_fields(tree):
if key != "body":
continue
for i, cls_node in enumerate(nodes):
if not isinstance(cls_node, ast.ClassDef):
continue
if not (cls_conf := conf.get(cls_node.name)):
continue
for fn_name, patch_conf in cls_conf.items():
patch = format_patch(patch_conf)
yield (
cls_node.end_lineno,
f"""
{INDENT1}{patch}
{INDENT1}def {fn_name}(self):
{INDENT2}return super().{fn_name}()
""".rstrip(),
)
# Phase 2: Iterate and mark inhereted tests
for key, nodes in ast.iter_fields(tree):
if key != "body":
continue
for i, cls_node in enumerate(nodes):
if not isinstance(cls_node, ast.ClassDef):
continue
if not (cls_conf := conf.get(cls_node.name)):
continue
for fn_name, patch_conf in cls_conf.items():
patch = format_patch(patch_conf)
cls_indent = " " * cls_node.col_offset
body_indent = cls_indent + (" " * COL_OFFSET)
yield (
cls_node.end_lineno,
(
f"\n"
f"{cls_indent}{patch}\n"
f"{cls_indent}def {fn_name}(self):\n"
f"{body_indent}return super().{fn_name}()"
),
)
🤖 Prompt for AI Agents
In tools/rpau/src/rpau/logic.py around lines 141 to 162, the injected wrapper
uses fixed INDENT1/INDENT2 which breaks formatting for nested or differently
indented classes; replace those with indentation computed from
cls_node.col_offset: build indent1 = " " * cls_node.col_offset and indent2 =
indent1 + " " * 4 (or indent1 + (" " * default_indent) if default indent
differs), then use indent1/indent2 when formatting the patch and remove the
now-dead INDENT1/INDENT2 constants and any references to them; ensure the
formatted string uses these computed variables so injected code aligns with the
class declaration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 I generally agree with @coderabbitai, of-course we'd even want to check the function def's arguments and docstrings once possible, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reactive-firewall Thank you for the agreement! You're absolutely right that the indentation fix using cls_node.col_offset is the proper approach.

Your point about checking function def arguments and docstrings is excellent - that would indeed make the tool more robust for future iterations. Those kinds of structural checks would help ensure the injected wrapper methods have the correct signatures and documentation to match the parent methods they're overriding.

For now, the indentation fix addresses the immediate formatting issue, but your suggestion opens up good possibilities for enhancing the tool's precision in future versions.

Comment on lines 165 to 189
def apply_conf(contents: str, conf: dict) -> str:
"""
Patch a given source code based on the conf.
Parameters
----------
contents : str
Raw python source code.
conf : Conf
Dict of `{ClassName: {test_name: Patch}}`.
Returns
-------
str
Patched raw python code.
"""
lines = contents.splitlines()
tree = ast.parse(contents)

# Going in reverse to not distrupt the line offset
patches = list(iter_patches(tree, conf))
for lineno, patch in sorted(patches, reverse=True):
lines.insert(lineno, patch)

return "\n".join(lines)
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Robustness: don’t mutate conf, handle SyntaxError gracefully, and inject required imports

  • If the upstream file uses newer syntax (e.g., PEP 695), ast.parse can fail on older interpreters. Catch SyntaxError, log, and return the original contents instead of crashing.
  • Do not mutate the caller’s conf; work on a deep copy because iter_patches consumes it.
  • Ensure required imports are present in the target file when emitting decorators that reference sys or anticipate_failure.
 def apply_conf(contents: str, conf: dict) -> str:
@@
-    lines = contents.splitlines()
-    tree = ast.parse(contents)
-
-    # Going in reverse to not distrupt the line offset
-    patches = list(iter_patches(tree, conf))
-    for lineno, patch in sorted(patches, reverse=True):
-        lines.insert(lineno, patch)
-
-    return "\n".join(lines)
+    import copy
+    lines = contents.splitlines()
+    # Work on a copy; iter_patches may consume entries
+    conf_copy: Conf = copy.deepcopy(conf)
+    try:
+        tree = ast.parse(contents)
+    except SyntaxError as e:
+        logger.warning("Skipping patching due to parse error: %s", e)
+        return contents
+
+    # Import injection for patches that require helpers
+    def _requires_imports(c: Conf) -> list[str]:
+        need: set[str] = set()
+        for cls in c.values():
+            for patch in cls.values():
+                method, cfg = next(iter(patch.items()))
+                if method == "expectedFailureIfWindows":
+                    need.add("import sys")
+                    need.add("from test.support import anticipate_failure")
+                if method == "skipIf":
+                    cond = cfg.get("cond", "")
+                    if "sys." in cond:
+                        need.add("import sys")
+                    if "os." in cond:
+                        need.add("import os")
+        return sorted(need)
+
+    def _import_present(line: str, mod: str) -> bool:
+        return line.startswith(mod) or f"{mod.split()[-1]} " in line
+
+    def _imports_in_file(existing: list[str], mod: str) -> bool:
+        return any(_import_present(l.strip(), mod) for l in existing if l.strip().startswith(("import ", "from ")))
+
+    needed = [m for m in _requires_imports(conf_copy) if not _imports_in_file(lines, m)]
+    if needed:
+        # Insert after shebang/encoding/docstring if present
+        insert_at = 0
+        if insert_at < len(lines) and lines[0].startswith("#!"):
+            insert_at += 1
+        if insert_at < len(lines) and "coding:" in lines[insert_at]:
+            insert_at += 1
+        # naive docstring skipper
+        if insert_at < len(lines) and lines[insert_at].lstrip().startswith(('"""', "'''")):
+            q = lines[insert_at].lstrip()[:3]
+            insert_at += 1
+            while insert_at < len(lines) and q not in lines[insert_at]:
+                insert_at += 1
+            insert_at = min(insert_at + 1, len(lines))
+        for mod in reversed(needed):
+            lines.insert(insert_at, mod)
+
+    # Going in reverse to not disrupt line offsets
+    patches = list(iter_patches(tree, conf_copy))
+    for lineno, patch in sorted(patches, reverse=True):
+        lines.insert(lineno, patch)
+
+    return "\n".join(lines)

If you prefer, I can extract the import-injection helpers into a separate module for reuse. Do you want me to push a patch?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def apply_conf(contents: str, conf: dict) -> str:
"""
Patch a given source code based on the conf.
Parameters
----------
contents : str
Raw python source code.
conf : Conf
Dict of `{ClassName: {test_name: Patch}}`.
Returns
-------
str
Patched raw python code.
"""
lines = contents.splitlines()
tree = ast.parse(contents)
# Going in reverse to not distrupt the line offset
patches = list(iter_patches(tree, conf))
for lineno, patch in sorted(patches, reverse=True):
lines.insert(lineno, patch)
return "\n".join(lines)
def apply_conf(contents: str, conf: dict) -> str:
"""
Patch a given source code based on the conf.
Parameters
----------
contents : str
Raw python source code.
conf : Conf
Dict of `{ClassName: {test_name: Patch}}`.
Returns
-------
str
Patched raw python code.
"""
import copy
lines = contents.splitlines()
# Work on a copy; iter_patches may consume entries
conf_copy: Conf = copy.deepcopy(conf)
try:
tree = ast.parse(contents)
except SyntaxError as e:
logger.warning("Skipping patching due to parse error: %s", e)
return contents
# Import injection for patches that require helpers
def _requires_imports(c: Conf) -> list[str]:
need: set[str] = set()
for cls in c.values():
for patch in cls.values():
method, cfg = next(iter(patch.items()))
if method == "expectedFailureIfWindows":
need.add("import sys")
need.add("from test.support import anticipate_failure")
if method == "skipIf":
cond = cfg.get("cond", "")
if "sys." in cond:
need.add("import sys")
if "os." in cond:
need.add("import os")
return sorted(need)
def _import_present(line: str, mod: str) -> bool:
return line.startswith(mod) or f"{mod.split()[-1]} " in line
def _imports_in_file(existing: list[str], mod: str) -> bool:
return any(
_import_present(l.strip(), mod)
for l in existing
if l.strip().startswith(("import ", "from "))
)
needed = [
m for m in _requires_imports(conf_copy)
if not _imports_in_file(lines, m)
]
if needed:
# Insert after shebang/encoding/docstring if present
insert_at = 0
if insert_at < len(lines) and lines[0].startswith("#!"):
insert_at += 1
if insert_at < len(lines) and "coding:" in lines[insert_at]:
insert_at += 1
# naive docstring skipper
if insert_at < len(lines) and lines[insert_at].lstrip().startswith(('"""', "'''")):
q = lines[insert_at].lstrip()[:3]
insert_at += 1
while insert_at < len(lines) and q not in lines[insert_at]:
insert_at += 1
insert_at = min(insert_at + 1, len(lines))
for mod in reversed(needed):
lines.insert(insert_at, mod)
# Going in reverse to not disrupt line offsets
patches = list(iter_patches(tree, conf_copy))
for lineno, patch in sorted(patches, reverse=True):
lines.insert(lineno, patch)
return "\n".join(lines)
🤖 Prompt for AI Agents
In tools/rpau/src/rpau/logic.py around lines 165 to 189, make three fixes: 1)
Wrap ast.parse(contents) in a try/except catching SyntaxError — log the error
(or debug message) and return the original contents unchanged to avoid crashing
on newer-syntax files; 2) Avoid mutating the caller's conf by making a deep copy
(e.g., copy.deepcopy(conf)) and pass the copy into iter_patches so iter_patches
can consume it safely; 3) After building the list of patches but before
inserting them, detect if emitted patches reference external names like sys or
anticipate_failure and ensure the corresponding import statements are present in
the target source: compute needed imports, check existing lines for those
imports (skipping shebang/module docstring), and insert missing import lines
near the top (after shebang/docstring) so decorators that reference sys or
anticipate_failure resolve. Ensure behavior otherwise unchanged.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 NIT -- I generally agree with @coderabbitai on the use of a deep-copy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reactive-firewall Thank you for the agreement! The deep-copy approach will indeed prevent the mutation issue where iter_patches consumes entries from the original conf dictionary, making the function safer for reuse. Appreciate the validation on that technical detail.

@youknowone
Copy link
Member

@ShaharNaveh @reactive-firewall Thank you so much.

With this tool, *.toml is the original source and the test files are patched based on *.toml files. right?
Currently the tool itself doesn't look like an enhancement yet, but I agree it will be useful once we can automatically collect the toml files easy.
Though guiding people NOT to directly edit python test files may be a challenge, but let's think about it more later.

I have questions about the content of toml files. It currently contains expectedFailure, expectedFailureIfWindows, and skip by condition. Now it makes sense because it is hand-written. But not sure they are always automatically decidable.

A few cases are easy to select

  • expectedFailure and expecctedFailureIfWindows can be auto-selectable if we run all the test.
  • some skips can be auto-selectable because they simply crash.

The other cases are not.

  • Flaky errors are not auto-markable. do we maintain a part of toml files manually? That makes sense to me.
  • Syntax errors need to be commented out.

@ShaharNaveh
Copy link
Collaborator Author

With this tool, *.toml is the original source and the test files are patched based on *.toml files. right?

Correct.

Currently the tool itself doesn't look like an enhancement yet, but I agree it will be useful once we can automatically collect the toml files easy. Though guiding people NOT to directly edit python test files may be a challenge, but let's think about it more later.

By making a comment at the top of each python file to not edit it directly but through the tool, and by adding a CI step that will run with a --check flag to ensure no manual edits were made we can enforce it (will need to implement it)

I have questions about the content of toml files. It currently contains expectedFailure, expectedFailureIfWindows, and skip by condition. Now it makes sense because it is hand-written. But not sure they are always automatically decidable.

The purpose is not to automatically detect it. The purpose is to make the existing patches persistent across version updates.

A few cases are easy to select

  • expectedFailure and expecctedFailureIfWindows can be auto-selectable if we run all the test.
  • some skips can be auto-selectable because they simply crash.

The other cases are not.

  • Flaky errors are not auto-markable. do we maintain a part of toml files manually? That makes sense to me.
  • Syntax errors need to be commented out.

About syntax errors and other patches I've suggested to use "patch" files as stated here:
#6089 (comment)


The motivation behind this tool is when you copy a new test file from a newer python version, the test names, and failures are most likely to stay the same. so this tool can easily reapply patches from a predefined configuration.

I will try to automate the creation of toml files for tests from the existing patches that we have, but I'll do it after it's confirmed that this tool is wanted.

@youknowone

@youknowone
Copy link
Member

Oh, thank you. It is more like a migration tool. That makes sense.

If it is a migration tool, I think there can be a middle ground for easy contribution and easy migration.
Rather than having toml files as a original source, can we have a reverse-procedure of this patch tool? If we have a both tool to retrieving patch from existing files and applying patch to new file, contributors still can easily edit the existing test files, but it will be also easy to applicable to the next upgrade.

@ShaharNaveh
Copy link
Collaborator Author

Oh, thank you. It is more like a migration tool. That makes sense.

Yes! Sorry for the misleading PR title. A migration tool is a better name for sure.

If it is a migration tool, I think there can be a middle ground for easy contribution and easy migration. Rather than having toml files as a original source, can we have a reverse-procedure of this patch tool?

ofc! it shouldn't be too hard to implement. But, one of the reasons I chose to have an external conf file is for cases where tests get renamed, moved between classes and even moves between files entirely. It would be very hard to implement the detection of when tests moves, It's easier to move entries in the toml files IMO.

Also, even if we had a completely automated way to mark failing tests, how would you deal with flaky tests (tests that pass sometimes), you'd still have to have a centralized place where you state which tests gets skipped.

If we have a both tool to retrieving patch from existing files and applying patch to new file, contributors still can easily edit the existing test files, but it will be also easy to applicable to the next upgrade.

I agree. One more reason for choosing an external conf is to keep track of upstream cpython version for modules and tests, it would be easier to see down the line which modules and tests are oldest thus should require more effort to update, where on the other end we can get the low hanging fruits for modules that are needing to be updated for minor version only (i.e 3.13.6 -> 3.13.7).

Also, we have a problem currently that we cannot update some module and test files due to unsupported syntax or missing rust modules. IMO it's better to not have a completely automated test updates ATM but to ease the update procedure for the majority of modules and tests.

@youknowone
Copy link
Member

ofc! it shouldn't be too hard to implement. But, one of the reasons I chose to have an external conf file is for cases where tests get renamed, moved between classes and even moves between files entirely. It would be very hard to implement the detection of when tests moves, It's easier to move entries in the toml files IMO.

Yes, I agree the toml files are useful when we need to edit them. But using it as an intermediate step between prev version to next version looks to have more potential then using it as an or original source. Do you think using toml files as the original source of the patch can be more useful? Please tell me if I am missing something.

I agree. One more reason for choosing an external conf is to keep track of upstream cpython version for modules and tests, it would be easier to see down the line which modules and tests are oldest thus should require more effort to update, where on the other end we can get the low hanging fruits for modules that are needing to be updated for minor version only

This is an attractive idea. I didn't think about this point. If we don't use toml as original source of patch, probably adding which version of Python introduced the test to comments could be useful. Not sure this is good enough reason to give up easy contribution for new contributors.

Also, even if we had a completely automated way to mark failing tests, how would you deal with flaky tests (tests that pass sometimes), you'd still have to have a centralized place where you state which tests gets skipped.

Also, we have a problem currently that we cannot update some module and test files due to unsupported syntax or missing rust modules. IMO it's better to not have a completely automated test updates ATM but to ease the update procedure for the majority of modules and tests.

I honestly don't have much idea about automated way. If it is not the goal, let's not think about it too much.

@ShaharNaveh
Copy link
Collaborator Author

ofc! it shouldn't be too hard to implement. But, one of the reasons I chose to have an external conf file is for cases where tests get renamed, moved between classes and even moves between files entirely. It would be very hard to implement the detection of when tests moves, It's easier to move entries in the toml files IMO.

Yes, I agree the toml files are useful when we need to edit them. But using it as an intermediate step between prev version to next version looks to have more potential then using it as an or original source. Do you think using toml files as the original source of the patch can be more useful? Please tell me if I am missing something.

Yes, you are 100% correct. I think we should have one tool with centralized configuration that manages tests and modules E2E, both for marking tests with decorators and both for patching code where necessary. The original source of the patch will be managed via the tool.

I agree. One more reason for choosing an external conf is to keep track of upstream cpython version for modules and tests, it would be easier to see down the line which modules and tests are oldest thus should require more effort to update, where on the other end we can get the low hanging fruits for modules that are needing to be updated for minor version only

This is an attractive idea. I didn't think about this point. If we don't use toml as original source of patch, probably adding which version of Python introduced the test to comments could be useful.

This will be error prone and very tedious to do, let's say that some tests from test_x.py moved to test_y.py and only some were renamed/changed. It will be a time consuming task to see which tests were introduced in what version. Not to mention that we still have some test files that were updated months & years ago, you'll need to go through each test and see when was it written, and deduce the appropriate python version it based on the git blame, which is a lot of effort with very little gain IMO comparing to updating the whole file and having a single comment which CPython version it's aligned with.

Even if a module wasn't updated between our current module version and the upstream it's still valuable to know that it's aligned so it can be excluded from a TODO list like this: #5529

Not sure this is good enough reason to give up easy contribution for new contributors.

A valid point and also one that I care about a lot.

Everything is a tradeoff, if we choose to go with this tool there's going to be some learning curve of how to use it. On the other hand the current state of how we do things is very tedious and painfully slow. For me at least, the process of updating a module (and its tests) is as follows:

  1. Copy the files from CPython
  2. Run git diff, and re-apply any patches that were present before
  3. Watch which tests fails in the CI
  4. Patch module/test accordingly

This tool aims to do steps 1 and 2, which takes the majority of time (and are very boring).

Personally, I think that a new contributor can still provide value with this approach, while also having a standard on how to update modules and tests.

Take for example this PR: #6030
It would have been clearer that it's preferred to update the entire module and not just adding a single function (and we will also know which version was it updated to 😉 )

Do you think that this tool will be a gatekeeper and prevent new contributors from contributing?

Also, even if we had a completely automated way to mark failing tests, how would you deal with flaky tests (tests that pass sometimes), you'd still have to have a centralized place where you state which tests gets skipped.

Also, we have a problem currently that we cannot update some module and test files due to unsupported syntax or missing rust modules. IMO it's better to not have a completely automated test updates ATM but to ease the update procedure for the majority of modules and tests.

I honestly don't have much idea about automated way. If it is not the goal, let's not think about it too much.

Let's start with defining what is an "automated way", if we are defining it as:

rm -r Lib/
cp -a /path/to/cpython_v3_13_7/Lib/ Lib/
./mark_all_failing_tests.py
git add .
git commit -m 'Updated all tests and modules to 3.13.7'
git push

Then I think that it's not practical ATM.

I think that it's more realistic ATM to have an easier migration tool with the ability to select which version each test & module are set to.

@youknowone
Copy link
Member

Update can be less painful by doing like this: https://github.com/RustPython/RustPython/wiki/How-to-update-test-files
Still worse than this patch though.

Do you think that this tool will be a gatekeeper and prevent new contributors from contributing?

I wish feature contributors don't care about too much things, if possible.
Yes, there is trade-off. I will try to avoid it if everyone have to learn how to use the new toml files and the new tools to fix a single bug.
On the other hand, asking library updaters to learn the new tool is totally fine because it is too painful without tools.

I hope the new update workflow to be:

  1. Run ./generate_migration Lib/test/test_something.py > test_something.toml
  2. Copy a new library to Lib/test/test_something.py
  3. Edit test_something.toml if required.
  4. Run ./apply_migration Lib/test/test_something.py test_something.toml

Elidable file name is fine. I see how worth this is for the library update tasks.

Step 2-4 must be the same as you intended; otherwise I am misunderstanding something. But not sure what do you think about step 1. In my opinion, trying to skipping step 1 potentially force too many people to learn the tool.

Regardless keeping the toml files in the repository or not, allowing them to be desynced will be better than forcing it to be always synced.

@ShaharNaveh
Copy link
Collaborator Author

Update can be less painful by doing like this: https://github.com/RustPython/RustPython/wiki/How-to-update-test-files Still worse than this patch though.

Do you think that this tool will be a gatekeeper and prevent new contributors from contributing?

I wish feature contributors don't care about too much things, if possible. Yes, there is trade-off. I will try to avoid it if everyone have to learn how to use the new toml files and the new tools to fix a single bug. On the other hand, asking library updaters to learn the new tool is totally fine because it is too painful without tools.

I hope the new update workflow to be:

  1. Run ./generate_migration Lib/test/test_something.py > test_something.toml
  2. Copy a new library to Lib/test/test_something.py
  3. Edit test_something.toml if required.
  4. Run ./apply_migration Lib/test/test_something.py test_something.toml

Elidable file name is fine. I see how worth this is for the library update tasks.

I like this idea, will implement it and send you a ping for review.

Step 2-4 must be the same as you intended; otherwise I am misunderstanding something.

No, you got everything perfectly:)

But not sure what do you think about step 1. In my opinion, trying to skipping step 1 potentially force too many people to learn the tool.

Originally the tool wasn't supposed to update the files from CPython, but was later added to demonstrate the potential of it. it might be better to split this functionality into a separate tool where you do something like:

./update_file.py Lib/test/test_os.py --to 'v3.13.7'

And it updates it behind the scene by fetching the new file from CPython and running the migration scripts (I'll do it in a future PR if desired).

Regardless keeping the toml files in the repository or not, allowing them to be desynced will be better than forcing it to be always synced.

If it's not going to be the source of truth then I'd advise against saving them in the repo, I think that it's better that way so it will not cause confusion. Similar to how code without comments, is better than the same code with misleading comments (IMO at least)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
scripts/lib_updater.py (7)

129-133: Guard against nested attributes to prevent AttributeError

The code assumes attr_node.value is an ast.Name, but it can also be an ast.Attribute for nested decorators, causing AttributeError.

The fix using getattr is appropriate and will handle nested attributes correctly.


310-316: JSON deserialization creates strings instead of UtMethod enum values

When loading patches from JSON, ut_method remains a string which will cause the match statement in fmt() to fail silently.

According to past review comments, this issue was already identified and should be fixed by converting the string to enum:

 patches = {
     cls_name: {
-        test_name: [PatchSpec(**spec) for spec in specs]
+        test_name: [
+            PatchSpec(
+                UtMethod(spec["ut_method"]),
+                spec.get("cond"),
+                spec.get("reason", "")
+            )
+            for spec in specs
+        ]
         for test_name, specs in tests.items()
     }
     for cls_name, tests in json.loads(args.patches.read_text()).items()
 }

40-40: Fix Python 3.12-only type alias syntax for compatibility

The type statement is only available in Python 3.12+. Use the traditional type alias syntax for broader compatibility.

-type Patches = dict[str, dict[str, list["PatchSpec"]]]
+Patches = dict[str, dict[str, list["PatchSpec"]]]

170-172: Guard against non-call decorators when extracting conditions

The code assumes dec_node is an ast.Call when extracting conditions, but decorators can also be ast.Attribute nodes without arguments.

 if ut_method.has_cond():
-    cond = ast.unparse(dec_node.args[0])
+    if isinstance(dec_node, ast.Call) and dec_node.args:
+        cond = ast.unparse(dec_node.args[0])
+    else:
+        # Skip malformed decorator without condition
+        continue

84-94: Escape reason strings to avoid syntax errors when quotes appear

The fmt() method directly interpolates self.reason into quoted strings without escaping, which will generate invalid Python syntax if the reason contains quotes.

+import json  # Add at the top of the file
+
 def fmt(self) -> str:
     prefix = f"@unittest.{self.ut_method}"
     match self.ut_method:
         case UtMethod.ExpectedFailure:
             line = f"{prefix} # {COMMENT}; {self.reason}"
         case UtMethod.ExpectedFailureIfWindows | UtMethod.Skip:
-            line = f'{prefix}("{COMMENT}; {self.reason}")'
+            msg = f"{COMMENT}; {self.reason}".rstrip(";").strip()
+            line = f"{prefix}({json.dumps(msg)})"
         case UtMethod.SkipIf | UtMethod.SkipUnless | UtMethod.ExpectedFailureIf:
-            line = f'{prefix}({self.cond}, "{COMMENT}; {self.reason}")'
+            msg = f"{COMMENT}; {self.reason}".rstrip(";").strip()
+            line = f"{prefix}({self.cond}, {json.dumps(msg)})"
 
-    return line.strip().rstrip(";").strip()
+    return line.strip()

11-13: Documentation mismatch: clarify that omitting -o prints to stdout

The documentation incorrectly suggests that running without -o will override the --from path, but the code actually prints to stdout when -o is omitted.

-You can run the same command without `-o` to override the `--from` path:
+You can run the same command without `-o` to print to stdout (use `-o` flag to actually write to a file):
 
->>> ./{fname} --from Lib/test/foo.py --to ~/cpython/Lib/test/foo.py
+>>> ./{fname} --from Lib/test/foo.py --to ~/cpython/Lib/test/foo.py  # prints to stdout

143-152: Fix broken slice for expectedFailure reason extraction

The slice lines[dec_node.lineno - 1 : dec_node.lineno - 3 : -1] with negative step produces an empty range. Additionally, the regex doesn't match the format generated by fmt() which uses semicolons.

 case UtMethod.ExpectedFailure:
-    # Search first on decorator line, then in the line before
-    for line in lines[
-        dec_node.lineno - 1 : dec_node.lineno - 3 : -1
-    ]:
-        if COMMENT not in line:
-            continue
-        reason = "".join(re.findall(rf"{COMMENT} (.*)", line))
-        break
-    else:
-        continue
+    # Search decorator line, then the previous line
+    candidates = []
+    if 0 <= dec_node.lineno - 1 < len(lines):
+        candidates.append(lines[dec_node.lineno - 1])  # decorator line
+    if 0 <= dec_node.lineno - 2 < len(lines):
+        candidates.append(lines[dec_node.lineno - 2])  # previous line
+    
+    reason = ""
+    for line in candidates:
+        if COMMENT in line:
+            # Match both space and semicolon separators
+            m = re.search(rf"{re.escape(COMMENT)}\s*[:;]?\s*(.*)$", line)
+            if m:
+                reason = m.group(1).strip()
+                break
+    
+    if not reason:
+        continue
🧹 Nitpick comments (2)
scripts/lib_updater.py (2)

252-254: Typo in comment: "distrupt" should be "disrupt"

Minor typo in the comment.

-    # Going in reverse to not distrupt the line offset
+    # Going in reverse to not disrupt the line offset

236-244: Consider improving indentation flexibility for inherited tests

The hardcoded indentation for inherited test methods might not match the actual indentation style of the target file.

Consider deriving the indentation from the existing class methods or making it configurable:

+    # Derive indentation from existing methods in the class
+    class_indent = " " * (cls_node.col_offset if cls_node else COL_OFFSET)
+    method_indent = class_indent + INDENT1
+    body_indent = method_indent + INDENT1
+    
     for test_name, specs in tests.items():
-        patch_lines = "\n".join(f"{INDENT1}{spec.fmt()}" for spec in specs)
+        patch_lines = "\n".join(f"{method_indent}{spec.fmt()}" for spec in specs)
         yield (
             lineno,
             f"""

-{patch_lines}
-{INDENT1}def {test_name}(self):
-{INDENT2}return super().{test_name}()
+{patch_lines}
+{method_indent}def {test_name}(self):
+{body_indent}return super().{test_name}()
 """.rstrip(),
         )
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3e9872c and 6c615cc.

📒 Files selected for processing (1)
  • scripts/lib_updater.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/lib_updater.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: ShaharNaveh
PR: RustPython/RustPython#6089
File: scripts/lib_updater.py:292-297
Timestamp: 2025-08-30T14:40:05.858Z
Learning: In scripts/lib_updater.py, the --inplace flag intentionally writes to orig_file (not remote_file) even though patches are applied to remote_file content. This workflow allows updating the original RustPython test file with patches applied to new upstream CPython content.
📚 Learning: 2025-08-30T14:40:05.858Z
Learnt from: ShaharNaveh
PR: RustPython/RustPython#6089
File: scripts/lib_updater.py:292-297
Timestamp: 2025-08-30T14:40:05.858Z
Learning: In scripts/lib_updater.py, the --inplace flag intentionally writes to orig_file (not remote_file) even though patches are applied to remote_file content. This workflow allows updating the original RustPython test file with patches applied to new upstream CPython content.

Applied to files:

  • scripts/lib_updater.py
📚 Learning: 2025-07-19T03:16:56.511Z
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : NEVER comment out or delete any test code lines except for removing `unittest.expectedFailure` decorators and upper TODO comments in test files

Applied to files:

  • scripts/lib_updater.py
📚 Learning: 2025-07-19T03:16:56.511Z
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : The only acceptable modifications to test files are: 1. Removing `unittest.expectedFailure` decorators and the upper TODO comments when tests actually pass 2. Adding `unittest.expectedFailure` decorators when tests cannot be fixed

Applied to files:

  • scripts/lib_updater.py
🪛 Flake8 (7.2.0)
scripts/lib_updater.py

[error] 40-40: SyntaxError: invalid syntax

(E999)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check Rust code with rustfmt and clippy
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (2)
scripts/lib_updater.py (2)

332-333: Verify the intentional write-to-original workflow

Based on the retrieved learning, the tool intentionally applies patches from args.to (remote file) and writes the result to args.output (which defaults to stdout). This supports the workflow of updating RustPython test files with patches applied to new upstream CPython content.

The implementation correctly supports the documented workflow where patches are applied to upstream content and can be written back to the original file using the -o flag.


260-267: Clean implementation of output handling with stdout support

The function properly handles both file output and stdout (using "-" convention), which aligns well with Unix conventions.

@ShaharNaveh
Copy link
Collaborator Author

@ShaharNaveh @reactive-firewall How do you think about the next step?

For me at least is to keep using this tool to update some tests and find edge cases that were not encountered until now

Copy link

@reactive-firewall reactive-firewall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM!

@reactive-firewall
Copy link

Prototype of using this new tool in CI (e.g., to check for untouched tests / regressions ) went smoothly. I'd personally say the update tool is ready to merge.


@ShaharNaveh @reactive-firewall How do you think about the next step?

For me at least is to keep using this tool to update some tests and find edge cases that were not encountered until now

@ShaharNaveh the main edge-case I have run-into is when the user provides a bad path (e.g., real FileNotFoundError) the exception is not handled well, e.g.,

patches = build_patch_dict(iter_patches(args.gather_from.read_text()))

throws unhandled FileNotFoundError

TL;DR - Regarding undiscussed edge-cases

Also consider taking a look at the edgecases in Lib/test/test_future_stmt/badsyntax_future*.py (which won't run with RustPython unittests even with the fix_tests script -- I'm not sure what to do with whole files that fail to parse, myself 🤷, but they could be patched with some fencing probably)

Tip

Consider searching the output of this test-run for the term "inconclusive" to highlight all kinds of edge-cases (including tests that hang with RustPython)

More importantly: Great work @ShaharNaveh & thanks for this tool!

Hope this helps 🙇


@youknowone

TL;DR - in general

Next steps might be to mark more "patches" and perhaps a counterpart tool that can bring over trivial changed reference lib modules based on a given reference python version and even possibly apply some kind of patch per function definition like the test tool does, but that is getting ahead for now. My rationale is this would let RustPython target each reference python implementation independently (e.g., main could still focus on latest stable 3.13 series while hopefully any work-in-progress for other reference versions could also have a path towards support, without conflicts) 🤷 but for now it would only need to target one version at a time and that would be future-proof enough for tracking. Of-course

in my corner
However, I am more interested in the CI/CD infrastructure and, in my spare time, hope to look into shoring up the reporting of automated compliance testing with each reference version (e.g., kinda like the whatsleft cronjob CI, but for reference compliance visibility instead) centered around this tool and will ping you on a separate PR sometime after this tool is merged into main (so that CI reporting workflow can rely on it in RustPython main).


Ref: GHI #5974

@ShaharNaveh
Copy link
Collaborator Author

@ShaharNaveh the main edge-case I have run-into is when the user provides a bad path (e.g., real FileNotFoundError) the exception is not handled well, e.g.,

patches = build_patch_dict(iter_patches(args.gather_from.read_text()))

throws unhandled FileNotFoundError

This is intended. What would you expect to get as a user instead?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
scripts/lib_updater.py (4)

170-172: Guard cond extraction against non-call decorators

This can raise when the decorator is not a call or has no args.

-                        if ut_method.has_cond():
-                            cond = ast.unparse(dec_node.args[0])
+                        if ut_method.has_cond():
+                            if isinstance(dec_node, ast.Call) and dec_node.args:
+                                cond = ast.unparse(dec_node.args[0])
+                            else:
+                                # Malformed/missing condition; skip safely.
+                                continue

84-95: Escape reason strings to generate valid Python for decorators

Interpolating reason directly can break when it contains quotes or backslashes. Also no need to strip trailing semicolons from the rendered line.

     def fmt(self) -> str:
         prefix = f"@unittest.{self.ut_method}"
         match self.ut_method:
             case UtMethod.ExpectedFailure:
                 line = f"{prefix} # {COMMENT}; {self.reason}"
             case UtMethod.ExpectedFailureIfWindows | UtMethod.Skip:
-                line = f'{prefix}("{COMMENT}; {self.reason}")'
+                msg = f"{COMMENT}; {self.reason}".strip().rstrip(";")
+                line = f"{prefix}({json.dumps(msg)})"
             case UtMethod.SkipIf | UtMethod.SkipUnless | UtMethod.ExpectedFailureIf:
-                line = f'{prefix}({self.cond}, "{COMMENT}; {self.reason}")'
+                msg = f"{COMMENT}; {self.reason}".strip().rstrip(";")
+                line = f"{prefix}({self.cond}, {json.dumps(msg)})"
 
-        return line.strip().rstrip(";").strip()
+        return line.strip()

309-316: Coerce ut_method from JSON to UtMethod to avoid runtime errors in fmt()

PatchSpec(**spec) leaves ut_method as str; match self.ut_method won’t hit enum cases and line stays undefined.

-    if args.patches:
-        patches = {
-            cls_name: {
-                test_name: [PatchSpec(**spec) for spec in specs]
-                for test_name, specs in tests.items()
-            }
-            for cls_name, tests in json.loads(args.patches.read_text()).items()
-        }
+    if args.patches:
+        raw = json.loads(args.patches.read_text())
+        patches = {
+            cls_name: {
+                test_name: [
+                    PatchSpec(
+                        UtMethod(spec["ut_method"]),
+                        spec.get("cond"),
+                        spec.get("reason", ""),
+                    )
+                    for spec in specs
+                ]
+                for test_name, specs in tests.items()
+            }
+            for cls_name, tests in raw.items()
+        }

11-19: Fix docs: omission of -o prints to stdout, it does not overwrite files

The example currently suggests it “overrides the --from path”. Clarify usage and show redirection if desired.

-You can run the same command without `-o` to override the `--from` path:
+You can run the same command without `-o` to print to stdout (use shell redirection `>` if you want to overwrite a file):
@@
->>> ./{fname} --from Lib/test/foo.py --to ~/cpython/Lib/test/foo.py
+>>> ./{fname} --from Lib/test/foo.py --to ~/cpython/Lib/test/foo.py > Lib/test/foo.py

Also remove the stray space before ~/cpython on Line 7.

-To move the patches found in `Lib/test/foo.py` to ` ~/cpython/Lib/test/foo.py` then write the contents back to `Lib/test/foo.py`
+To move the patches found in `Lib/test/foo.py` to `~/cpython/Lib/test/foo.py` then write the contents back to `Lib/test/foo.py`
🧹 Nitpick comments (4)
scripts/lib_updater.py (4)

251-254: Sort modifications by line number only to avoid incidental ordering by patch text

Minor stability improvement; avoids reordering when two entries share the same line.

-    for lineno, patch in sorted(modifications, reverse=True):
+    for lineno, patch in sorted(modifications, key=lambda m: m[0], reverse=True):

229-229: Typo: “inhereted” → “inherited”

-    # Phase 2: Iterate and mark inhereted tests
+    # Phase 2: Iterate and mark inherited tests

260-267: Open output file with explicit UTF-8 encoding

Prevents locale-dependent behavior across platforms.

-    with open(dest, "w") as fd:
+    with open(dest, "w", encoding="utf-8") as fd:
         fd.write(data)

115-121: Rename iter_patch_entiresiter_patch_entries (spelling) and update call site

Improves readability; internal API only.

-class PatchEntry(typing.NamedTuple):
+class PatchEntry(typing.NamedTuple):
@@
-    def iter_patch_entires(
+    def iter_patch_entries(
         cls, tree: ast.Module, lines: list[str]
     ) -> "Iterator[typing.Self]":

And:

-    yield from PatchEntry.iter_patch_entires(tree, lines)
+    yield from PatchEntry.iter_patch_entries(tree, lines)

Also applies to: 198-202

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c615cc and c0e90cc.

📒 Files selected for processing (1)
  • scripts/lib_updater.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/lib_updater.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: ShaharNaveh
PR: RustPython/RustPython#6089
File: scripts/lib_updater.py:292-297
Timestamp: 2025-08-30T14:40:05.858Z
Learning: In scripts/lib_updater.py, the --inplace flag intentionally writes to orig_file (not remote_file) even though patches are applied to remote_file content. This workflow allows updating the original RustPython test file with patches applied to new upstream CPython content.
📚 Learning: 2025-08-30T14:40:05.858Z
Learnt from: ShaharNaveh
PR: RustPython/RustPython#6089
File: scripts/lib_updater.py:292-297
Timestamp: 2025-08-30T14:40:05.858Z
Learning: In scripts/lib_updater.py, the --inplace flag intentionally writes to orig_file (not remote_file) even though patches are applied to remote_file content. This workflow allows updating the original RustPython test file with patches applied to new upstream CPython content.

Applied to files:

  • scripts/lib_updater.py
📚 Learning: 2025-07-19T03:16:56.511Z
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : NEVER comment out or delete any test code lines except for removing `unittest.expectedFailure` decorators and upper TODO comments in test files

Applied to files:

  • scripts/lib_updater.py
📚 Learning: 2025-07-19T03:16:56.511Z
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : The only acceptable modifications to test files are: 1. Removing `unittest.expectedFailure` decorators and the upper TODO comments when tests actually pass 2. Adding `unittest.expectedFailure` decorators when tests cannot be fixed

Applied to files:

  • scripts/lib_updater.py
🪛 Flake8 (7.2.0)
scripts/lib_updater.py

[error] 40-40: SyntaxError: invalid syntax

(E999)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Check Rust code with rustfmt and clippy

if typing.TYPE_CHECKING:
from collections.abc import Iterator

type Patches = dict[str, dict[str, list["PatchSpec"]]]
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix Python syntax: replace Python 3.12 type alias with TypeAlias

This is a SyntaxError under flake8 (E999). Use a backwards-compatible alias.

+from typing import TypeAlias
@@
-type Patches = dict[str, dict[str, list["PatchSpec"]]]
+Patches: TypeAlias = dict[str, dict[str, list["PatchSpec"]]]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
type Patches = dict[str, dict[str, list["PatchSpec"]]]
# At the top of the file, add:
from typing import TypeAlias
# ...later, replace the Python 3.12-only alias:
Patches: TypeAlias = dict[str, dict[str, list["PatchSpec"]]]
🧰 Tools
🪛 Flake8 (7.2.0)

[error] 40-40: SyntaxError: invalid syntax

(E999)

🤖 Prompt for AI Agents
In scripts/lib_updater.py around line 40, the use of the Python 3.12 "type"
alias syntax causes a SyntaxError under flake8; replace it with a
backwards-compatible TypeAlias: add "from typing import TypeAlias" to the
imports (or include it in the existing typing import) and change the alias
declaration to use TypeAlias (e.g. Patches: TypeAlias = dict[str, dict[str,
list['PatchSpec']]]), ensuring any forward-referenced names remain quoted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine and can be ignored

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaharNaveh, understood! I'll note that this syntax issue can be ignored for this tool.


🧠 Learnings used
Learnt from: ShaharNaveh
PR: RustPython/RustPython#6089
File: scripts/lib_updater.py:292-297
Timestamp: 2025-08-30T14:40:05.858Z
Learning: In scripts/lib_updater.py, the --inplace flag intentionally writes to orig_file (not remote_file) even though patches are applied to remote_file content. This workflow allows updating the original RustPython test file with patches applied to new upstream CPython content.

Comment on lines +198 to +202
def iter_patches(contents: str) -> "Iterator[PatchEntry]":
lines = contents.splitlines()
tree = ast.parse(contents)
yield from PatchEntry.iter_patch_entires(tree, lines)

Copy link
Contributor

@coderabbitai coderabbitai bot Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle unparseable files gracefully (bad-syntax tests)

ast.parse can raise SyntaxError on intentionally invalid test files; fail gracefully with a clear message.

-def iter_patches(contents: str) -> "Iterator[PatchEntry]":
-    lines = contents.splitlines()
-    tree = ast.parse(contents)
-    yield from PatchEntry.iter_patch_entires(tree, lines)
+def iter_patches(contents: str) -> "Iterator[PatchEntry]":
+    lines = contents.splitlines()
+    try:
+        tree = ast.parse(contents)
+    except SyntaxError as e:
+        print(f"ERROR: could not parse source for patch extraction: {e}", file=sys.stderr)
+        return
+    yield from PatchEntry.iter_patch_entires(tree, lines)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def iter_patches(contents: str) -> "Iterator[PatchEntry]":
lines = contents.splitlines()
tree = ast.parse(contents)
yield from PatchEntry.iter_patch_entires(tree, lines)
def iter_patches(contents: str) -> "Iterator[PatchEntry]":
lines = contents.splitlines()
try:
tree = ast.parse(contents)
except SyntaxError as e:
print(f"ERROR: could not parse source for patch extraction: {e}", file=sys.stderr)
return
yield from PatchEntry.iter_patch_entires(tree, lines)
🤖 Prompt for AI Agents
In scripts/lib_updater.py around lines 198 to 202, ast.parse(contents) can raise
SyntaxError for intentionally invalid test files; wrap the parse call in a
try/except that catches SyntaxError and handle it gracefully by raising a
clearer ValueError (or RuntimeError) with a human-readable message that includes
the original exception details (or alternatively return an empty iterator),
preserving the function signature and ensuring the original exception is chained
for debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine and intended. you can ignore it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaharNaveh Understood! I'll ignore the error handling suggestion. The current behavior of letting ast.parse() raise SyntaxError directly is intentional for your use case.


✏️ Learnings added
Learnt from: ShaharNaveh
PR: RustPython/RustPython#6089
File: scripts/lib_updater.py:198-202
Timestamp: 2025-09-07T05:38:31.644Z
Learning: In scripts/lib_updater.py, the iter_patches function intentionally does not handle SyntaxError from ast.parse(contents). The author confirmed this behavior is fine and intended - the tool should fail fast on unparseable files rather than silently skip processing.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as `unittest.expectedFailure`, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only

Learnt from: ShaharNaveh
PR: RustPython/RustPython#6089
File: scripts/lib_updater.py:292-297
Timestamp: 2025-08-30T14:40:05.858Z
Learning: In scripts/lib_updater.py, the --inplace flag intentionally writes to orig_file (not remote_file) even though patches are applied to remote_file content. This workflow allows updating the original RustPython test file with patches applied to new upstream CPython content.

Comment on lines +212 to +218
def iter_patch_lines(tree: ast.Module, patches: Patches) -> "Iterator[tuple[int, str]]":
cache = {} # Used in phase 2

# Phase 1: Iterate and mark existing tests
for cls_node, fn_node in iter_tests(tree):
cache[cls_node.name] = cls_node.end_lineno
specs = patches.get(cls_node.name, {}).pop(fn_node.name, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Respect class indentation when adding inherited test wrappers

Using a fixed 4-space indent can produce invalid indentation for nested/indented classes. Cache both end_lineno and col_offset and compute indent dynamically.

-def iter_patch_lines(tree: ast.Module, patches: Patches) -> "Iterator[tuple[int, str]]":
-    cache = {}  # Used in phase 2
+def iter_patch_lines(tree: ast.Module, patches: Patches) -> "Iterator[tuple[int, str]]":
+    # cache: {class_name: (end_lineno, col_offset)}
+    cache: dict[str, tuple[int, int]] = {}
@@
-    for cls_node, fn_node in iter_tests(tree):
-        cache[cls_node.name] = cls_node.end_lineno
+    for cls_node, fn_node in iter_tests(tree):
+        cache[cls_node.name] = (cls_node.end_lineno, cls_node.col_offset)
@@
-    # Phase 2: Iterate and mark inhereted tests
+    # Phase 2: Iterate and mark inherited tests
     for cls_name, tests in patches.items():
-        lineno = cache.get(cls_name)
-        if not lineno:
+        cached = cache.get(cls_name)
+        if not cached:
             print(f"WARNING: {cls_name} does not exist in remote file", file=sys.stderr)
             continue
+        lineno, cls_col = cached
         for test_name, specs in tests.items():
-            patch_lines = "\n".join(f"{INDENT1}{spec.fmt()}" for spec in specs)
+            indent1 = " " * (cls_col + COL_OFFSET)
+            indent2 = indent1 + (" " * COL_OFFSET)
+            patch_lines = "\n".join(f"{indent1}{spec.fmt()}" for spec in specs)
             yield (
                 lineno,
                 f"""
 {patch_lines}
-{INDENT1}def {test_name}(self):
-{INDENT2}return super().{test_name}()
+{indent1}def {test_name}(self):
+{indent2}return super().{test_name}()
 """.rstrip(),
             )

Also applies to: 229-244

🤖 Prompt for AI Agents
In scripts/lib_updater.py around lines 212 to 218 (and also apply changes to
229-244), the code currently caches only end_lineno and uses a fixed 4-space
indent when inserting inherited test wrappers which breaks indentation for
nested classes; update the cache to store both end_lineno and col_offset for
each class, compute the insertion indent from the stored col_offset (e.g., use
that many spaces) instead of a hard-coded 4 spaces, and use that dynamic indent
when generating/adding wrapper function lines so inserted wrappers match the
class's actual indentation level.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thank you so much!

@youknowone youknowone merged commit c5c2bd0 into RustPython:main Sep 7, 2025
12 checks passed
@ShaharNaveh ShaharNaveh deleted the auto-updater branch September 7, 2025 07:09
@reactive-firewall
Copy link

@ShaharNaveh

What would you expect to get as a user instead?

TL;DR -- Expected mention about the "--from" argument; As this is a developer tool, the intentional nature also is reasonable.

Consider adding a message:

    else:
        try:
            patches = build_patch_dict(iter_patches(args.gather_from.read_text()))
        except FileNotFoundError as _cause:
            raise FileNotFoundError("--from MUST specify an existing file-path") from _cause

or even better, consider just handling the exception by aborting; e.g., please consider:

    else:
        try:
            if args.gather_from:  # simple truthy check
                patches = build_patch_dict(iter_patches(args.gather_from.read_text()))
            else:  # falsy should panic
                raise FileNotFoundError("--from MUST specify an existing file-path") from None
        except FileNotFoundError as _cause:
            if sys.stderr.isatty():  # heuristic - probably run interactively by user
                print(
                    "ERROR - file or directory was not found; --from MUST specify an existing file-path",
                    file=sys.stderr,
                )
                sys.exit(66) # heuristic - common POSIX tool code for File-not-found
            else:
                raise FileNotFoundError("--from MUST specify an existing file-path") from _cause

reactive-firewall added a commit to reactive-firewall/CICD-Patch-RustPython-5974 that referenced this pull request Sep 28, 2025
…n#6089

* cc:  @youknowone @ShaharNaveh

* demo of how to automate RustPython Auto-Patching and upload required testing patches (first attempt, YOLO)
* uses ShaharNaveh:auto-updater for rpau (see PR RustPython#6089) (Auto Patching and config)
* uses automation idea from GHI RustPython#5974 (reporting and automation)
reactive-firewall added a commit to reactive-firewall/CICD-Patch-RustPython-5974 that referenced this pull request Sep 28, 2025
reactive-firewall added a commit to reactive-firewall/CICD-Patch-RustPython-5974 that referenced this pull request Sep 28, 2025
reactive-firewall added a commit to reactive-firewall/CICD-Patch-RustPython-5974 that referenced this pull request Sep 28, 2025
reactive-firewall referenced this pull request in reactive-firewall/CICD-Patch-RustPython-5974 Sep 28, 2025
* see GHI RustPython#5974 and PR RustPython#6089
reactive-firewall referenced this pull request in reactive-firewall/CICD-Patch-RustPython-5974 Sep 28, 2025
* see GHI RustPython#5974 and PR RustPython#6089
reactive-firewall referenced this pull request in reactive-firewall/CICD-Patch-RustPython-5974 Sep 28, 2025
* see GHI RustPython#5974 and PR RustPython#6089
reactive-firewall added a commit to reactive-firewall/CICD-Patch-RustPython-5974 that referenced this pull request Sep 28, 2025
reactive-firewall added a commit to reactive-firewall/CICD-Patch-RustPython-5974 that referenced this pull request Sep 28, 2025
reactive-firewall added a commit to reactive-firewall/CICD-Patch-RustPython-5974 that referenced this pull request Sep 28, 2025
reactive-firewall added a commit to reactive-firewall/CICD-Patch-RustPython-5974 that referenced this pull request Sep 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants