Conversation
rjurney
commented
Mar 13, 2026
- New, big, fancy, super, duper Pregel tutorial
- Moved Stack Exchange data content from Network Motif Finding Tutorial into Data Setup tutorial. Refer to from both motif and Pregel tutorials.
- Point at new tutorial(s) from list of tutorials.
- New network motif and Pregel tutorial Jupyter notebooks
- Some other minor changes...
…s.txt and split out requirements-dev.txt. Version bumps.
…ney/build-upgrades
…ney/build-upgrades
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
rjurney
left a comment
There was a problem hiding this comment.
Need to make all changes suggested by @copilot, @rjurney or @SemyonSinchenko
|
@SemyonSinchenko Well it had a bunch of non-conforming stuff for sure. Look at the changes.
Single quotes, etc. pre-commit often only runs on what it changes. We do want these fixes, right?
…On Fri, Mar 13, 2026 at 11:45 AM Sem ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
On python/dev/build_jar.py
<#809 (comment)>
:
pre-commit (python formatting) is "green" in main
—
Reply to this email directly, view it on GitHub
<#809 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAKJJIZ3FDZIQDV2GVSY5D4QRJNVAVCNFSM6AAAAACWQTPIDGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTSNBWGI2TSMBZGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
- Wrap pregel.py in main() with if __name__ guard to prevent side effects on import - Remove unused msg_to_dst variable in pregel.py Example 1 - Remove numpy dependency (not used anywhere in codebase) - Add mmdc to tutorials extras (required by generate_diagrams.py) - Restore Emacs gitignore entries, remove Anaconda entries - Delete extraneous files: BLOG_POST.md, RESEARCH.md, VERIFICATION.md, WRITING_STYLE.md - Regenerate poetry.lock after dependency changes Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
|
@cursoragent please review again |
|
@claude review |
There was a problem hiding this comment.
Pull request overview
Adds a new Pregel tutorial (docs + runnable script + diagrams) and refactors Stack Exchange data setup documentation/CLI to support the tutorial workflow, while also updating various formatting/configuration files across the Python and docs toolchain.
Changes:
- Introduces a comprehensive Pregel tutorial (Markdown + Python script) and supporting Mermaid→SVG diagram generation.
- Adds a standalone “Data Setup” tutorial and updates the Motif tutorial to reference it.
- Updates Python packaging/config (dependencies, optional extras, tooling versions) and docs/build tweaks (Laika benchmarks fallback, Python API link).
Reviewed changes
Copilot reviewed 21 out of 34 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| python/tests/test_graphframes.py | Formatting-only tweaks in tests. |
| python/pyproject.toml | Adds core deps + tutorials optional deps; bumps dev tool versions; updates Black config. |
| python/graphframes/tutorials/stackexchange.py | Updates usage header/comments for Spark 3.5 vs 4.x XML handling. |
| python/graphframes/tutorials/pregel.py | New runnable Pregel tutorial script with 7 examples. |
| python/graphframes/tutorials/motif.py | Updates header usage for Spark 3.5 vs 4.x GraphFrames artifacts. |
| python/graphframes/tutorials/generate_diagrams.py | New Mermaid diagram renderer producing tutorial SVG assets. |
| python/graphframes/tutorials/download.py | Changes CLI option naming/default folder for StackExchange downloader. |
| python/graphframes/connect/proto/graphframes_pb2.py | Formatting-only tweak in generated proto file. |
| python/graphframes/connect/graphframes_client.py | Formatting-only tweak for argument wrapping. |
| python/docs/underscores.py | Formatting and minor refactor of Sphinx extension helpers. |
| python/docs/epytext.py | Formatting and minor refactor of Sphinx extension helpers. |
| python/docs/conf.py | Formatting/quoting cleanup in Sphinx config. |
| python/dev/build_jar.py | Minor formatting simplification. |
| project/LaikaCustoms.scala | Adds fallback benchmark config values when benchmarks file is missing; updates Python API link. |
| docs/src/img/pregel-diagrams/*.svg | Adds generated SVG diagrams used by the Pregel tutorial. |
| docs/src/03-tutorials/05-neo4j-tutorial.md | Adds a Neo4j “validation summary” tutorial page. |
| docs/src/03-tutorials/04-pregel-tutorial.md | Adds the full Pregel tutorial content (large). |
| docs/src/03-tutorials/03-data-setup.md | Adds a dedicated StackExchange data setup tutorial. |
| docs/src/03-tutorials/02-motif-tutorial.md | Updates motif tutorial intro and data setup instructions/links. |
| docs/src/03-tutorials/01-tutorials.md | Adds links for the new Data Setup and Pregel tutorials. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| @@ -61,17 +73,19 @@ enable = true | |||
| "Bug Tracker" = "https://github.com/graphframes/graphframes/issues" | |||
|
|
|||
| [tool.poetry.group.dev.dependencies] | |||
| black = "^23.12.1" | |||
| flake8 = "^7.1.1" | |||
| isort = "^6.0.0" | |||
| black = "^25.12.0" | |||
| flake8 = "^7.3.0" | |||
| isort = "^7.0.0" | |||
| pyspark = { version = ">=3.5, <4.1", extras = ["connect"] } | |||
| grpcio = "<=1.67.1" | |||
| pytest = "^8.3.4" | |||
| pytest = "^9.0.2" | |||
| pre-commit = "^4.5.1" | |||
|
|
|||
| [tool.poetry.group.tutorials.dependencies] | |||
| py7zr = "^0.22.0" | |||
| requests = "^2.32.4" | |||
| click = "^8.1.8" | |||
| py7zr = "^1.1.0" | |||
| requests = "^2.32.5" | |||
| click = "^8.3.1" | |||
| mmdc = ">=0.4.0" | |||
There was a problem hiding this comment.
@SemyonSinchenko shit, I closed this....
There was a problem hiding this comment.
@rjurney Please, can you remove everything related to re-formatting of the whole codebase from this PR? It is a) hard to review b) will create a huge merge-conflict
I will do it by myself before the release and after we merge everything. Does it sound OK for you?
There was a problem hiding this comment.
Quite a reasonable request :)
|
Fixes #810 |
- Move click to core [project] dependencies (fixes broken CLI on base install) - Lazy-import tutorials.download in console.py (py7zr/requests stay optional) - Revert download.py to --data-dir with package-relative default path - Add --data-dir CLI option to pregel.py, motif.py, stackexchange.py - Wrap motif.py and stackexchange.py in main() with if __name__ guard - All tutorial scripts now use consistent --data-dir for data path Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
- Use regex with negative lookahead in generate_diagrams.py to avoid mutating multi-digit stroke-width values (e.g., stroke-width:10) - Remove neo4j tutorial (not part of this PR per reviewer) - Update data-setup tutorial to document --data-dir CLI option - Update pregel and motif tutorial docs to use data_dir path pattern Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
|
@SemyonSinchenko I updated https://github.com/rjurney/graphframes/tree/rjurney/pypi-tutorials fixing all the issues save one I asked you about, but it isn't updating this PR or running tests... I guess it is slack today. |
|
Okay, there it goes! |
The Jupyter notebooks in python/graphframes/tutorials/notebooks/ reference images via relative paths like ../img/... which resolve to python/graphframes/tutorials/img/. However, the actual images live in docs/src/img/. Add a symlink python/graphframes/tutorials/img -> ../../../docs/src/img so all 15 image references in the notebooks (Network_Motif_Finding.ipynb and Pregel.ipynb) resolve correctly without duplicating files. Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
Commit ec1b084 accidentally removed the connected components diagram from the Connected Components section of the Pregel tutorial. The SVG file exists at docs/src/img/pregel-diagrams/pregel-connected-components.svg and should be shown like all other example sections in the tutorial. The original commit was mislabeled as removing a 'Mermaid diagram' but actually removed a rendered SVG <img> tag. Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
The connected-components SVG renders with only 2 nodes and 0 edges (broken Mermaid render), making it useless in the tutorial. Reverting the prior mistaken restore; the image reference should remain absent from the Connected Components section. Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
…syntax Chained undirected edges (A --- B --- C) fail to render in LR layout with some Mermaid versions, producing a broken SVG with only 2 nodes. Fix by using explicit two-node edge lines instead. Also restore the connected-components diagram in the docs tutorial (04-pregel-tutorial.md) which was previously removed due to the broken SVG. With the correct rendering (18 nodes, 3 supersteps), the diagram is now suitable for the tutorial. Update generate_diagrams.py to use the fixed Mermaid syntax with a comment explaining the workaround. Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
In Jupyter notebooks and docs, <figcaption> rendered as unstyled block text indistinguishable from surrounding headings/paragraphs. - Add a <style> block at the top of each notebook's first markdown cell to globally apply caption styling (smaller, italic, grey, centered) via the figcaption CSS selector - Add equivalent inline style attributes to every <figcaption> in the Laika docs markdown files (04-pregel-tutorial.md and 02-motif-tutorial.md) for the static site build Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
These SVG files were accidentally generated by running generate_diagrams.py from the python/ working directory (causing relative output path to resolve as python/docs/src/img/ instead of docs/src/img/). Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
Replace relative 'docs/src/img/...' path with one resolved from __file__ so the script writes to the correct location regardless of cwd. Also includes minor SVG whitespace normalization from mmdc 0.4.1 re-render (stroke-width:3px vs stroke-width: 3px — functionally identical). Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
The previous approach used per-element inline style= attributes on the 15 figcaptions across the two tutorial docs files. That only covered existing files — any future doc page with a figcaption would be unstyled. Replace with a proper Laika-level fix: - Add docs/src/helium/custom.css with the figcaption rule - Wire it into the Helium theme via .site.internalCSS() in LaikaCustoms.scala (required in Laika 1.0+; the old automatic CSS directory scanning is gone) - Strip the now-redundant inline style= attributes from both tutorial docs The <style> block in the notebooks (Jupyter rendering) is unchanged — it remains the correct mechanism for notebooks since they do not go through Laika. Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
…s/97K edges Every mention of the ~130K nodes / ~97K edges figures now names the specific Stack Exchange dataset (stats.meta.stackexchange.com) so readers know exactly which archive produces those numbers. Updated in: - docs/src/03-tutorials/04-pregel-tutorial.md - docs/src/03-tutorials/02-motif-tutorial.md - python/graphframes/tutorials/notebooks/Pregel.ipynb (cell 2) - python/graphframes/tutorials/notebooks/Network_Motif_Finding.ipynb (cell 0) Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
<center> is a deprecated element that centers relative to its nearest
containing block. In Helium's layout that block spans the full page
(content column + both sidebars), so figures appeared shifted right.
Fix by:
- Adding figure { margin: 1em 0; text-align: center; } and
figure img { max-width: 100%; height: auto; } to custom.css and
the notebooks' <style> block — centering is now done via CSS
within the content column's own containing block
- Removing all 30 <center>/<\/center> wrappers from the two tutorial
docs and both notebooks; plain <figure> blocks are sufficient
Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #809 +/- ##
==========================================
- Coverage 84.94% 81.21% -3.74%
==========================================
Files 68 77 +9
Lines 3507 4263 +756
Branches 453 488 +35
==========================================
+ Hits 2979 3462 +483
- Misses 528 801 +273 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|