Skip to content

doc: Pregel Tutorial#809

Open
rjurney wants to merge 124 commits intographframes:mainfrom
rjurney:rjurney/pypi-tutorials
Open

doc: Pregel Tutorial#809
rjurney wants to merge 124 commits intographframes:mainfrom
rjurney:rjurney/pypi-tutorials

Conversation

@rjurney
Copy link
Collaborator

@rjurney rjurney commented Mar 13, 2026

  1. New, big, fancy, super, duper Pregel tutorial
  2. Moved Stack Exchange data content from Network Motif Finding Tutorial into Data Setup tutorial. Refer to from both motif and Pregel tutorials.
  3. Point at new tutorial(s) from list of tutorials.
  4. New network motif and Pregel tutorial Jupyter notebooks
  5. Some other minor changes...

…s.txt and split out requirements-dev.txt. Version bumps.
@SemyonSinchenko SemyonSinchenko changed the title Pregel Tutorial doc: Pregel Tutorial Mar 13, 2026
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Collaborator Author

@rjurney rjurney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make all changes suggested by @copilot, @rjurney or @SemyonSinchenko

@rjurney
Copy link
Collaborator Author

rjurney commented Mar 13, 2026 via email

- Wrap pregel.py in main() with if __name__ guard to prevent side effects on import
- Remove unused msg_to_dst variable in pregel.py Example 1
- Remove numpy dependency (not used anywhere in codebase)
- Add mmdc to tutorials extras (required by generate_diagrams.py)
- Restore Emacs gitignore entries, remove Anaconda entries
- Delete extraneous files: BLOG_POST.md, RESEARCH.md, VERIFICATION.md, WRITING_STYLE.md
- Regenerate poetry.lock after dependency changes

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
@rjurney
Copy link
Collaborator Author

rjurney commented Mar 13, 2026

@cursoragent please review again

@rjurney
Copy link
Collaborator Author

rjurney commented Mar 13, 2026

@claude review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Pregel tutorial (docs + runnable script + diagrams) and refactors Stack Exchange data setup documentation/CLI to support the tutorial workflow, while also updating various formatting/configuration files across the Python and docs toolchain.

Changes:

  • Introduces a comprehensive Pregel tutorial (Markdown + Python script) and supporting Mermaid→SVG diagram generation.
  • Adds a standalone “Data Setup” tutorial and updates the Motif tutorial to reference it.
  • Updates Python packaging/config (dependencies, optional extras, tooling versions) and docs/build tweaks (Laika benchmarks fallback, Python API link).

Reviewed changes

Copilot reviewed 21 out of 34 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
python/tests/test_graphframes.py Formatting-only tweaks in tests.
python/pyproject.toml Adds core deps + tutorials optional deps; bumps dev tool versions; updates Black config.
python/graphframes/tutorials/stackexchange.py Updates usage header/comments for Spark 3.5 vs 4.x XML handling.
python/graphframes/tutorials/pregel.py New runnable Pregel tutorial script with 7 examples.
python/graphframes/tutorials/motif.py Updates header usage for Spark 3.5 vs 4.x GraphFrames artifacts.
python/graphframes/tutorials/generate_diagrams.py New Mermaid diagram renderer producing tutorial SVG assets.
python/graphframes/tutorials/download.py Changes CLI option naming/default folder for StackExchange downloader.
python/graphframes/connect/proto/graphframes_pb2.py Formatting-only tweak in generated proto file.
python/graphframes/connect/graphframes_client.py Formatting-only tweak for argument wrapping.
python/docs/underscores.py Formatting and minor refactor of Sphinx extension helpers.
python/docs/epytext.py Formatting and minor refactor of Sphinx extension helpers.
python/docs/conf.py Formatting/quoting cleanup in Sphinx config.
python/dev/build_jar.py Minor formatting simplification.
project/LaikaCustoms.scala Adds fallback benchmark config values when benchmarks file is missing; updates Python API link.
docs/src/img/pregel-diagrams/*.svg Adds generated SVG diagrams used by the Pregel tutorial.
docs/src/03-tutorials/05-neo4j-tutorial.md Adds a Neo4j “validation summary” tutorial page.
docs/src/03-tutorials/04-pregel-tutorial.md Adds the full Pregel tutorial content (large).
docs/src/03-tutorials/03-data-setup.md Adds a dedicated StackExchange data setup tutorial.
docs/src/03-tutorials/02-motif-tutorial.md Updates motif tutorial intro and data setup instructions/links.
docs/src/03-tutorials/01-tutorials.md Adds links for the new Data Setup and Pregel tutorials.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 44 to +88
@@ -61,17 +73,19 @@ enable = true
"Bug Tracker" = "https://github.com/graphframes/graphframes/issues"

[tool.poetry.group.dev.dependencies]
black = "^23.12.1"
flake8 = "^7.1.1"
isort = "^6.0.0"
black = "^25.12.0"
flake8 = "^7.3.0"
isort = "^7.0.0"
pyspark = { version = ">=3.5, <4.1", extras = ["connect"] }
grpcio = "<=1.67.1"
pytest = "^8.3.4"
pytest = "^9.0.2"
pre-commit = "^4.5.1"

[tool.poetry.group.tutorials.dependencies]
py7zr = "^0.22.0"
requests = "^2.32.4"
click = "^8.1.8"
py7zr = "^1.1.0"
requests = "^2.32.5"
click = "^8.3.1"
mmdc = ">=0.4.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemyonSinchenko what do you want to do here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SemyonSinchenko shit, I closed this....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjurney Please, can you remove everything related to re-formatting of the whole codebase from this PR? It is a) hard to review b) will create a huge merge-conflict

I will do it by myself before the release and after we merge everything. Does it sound OK for you?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a reasonable request :)

@russelljurney-upside
Copy link
Contributor

Fixes #810

cursoragent and others added 3 commits March 14, 2026 02:03
- Move click to core [project] dependencies (fixes broken CLI on base install)
- Lazy-import tutorials.download in console.py (py7zr/requests stay optional)
- Revert download.py to --data-dir with package-relative default path
- Add --data-dir CLI option to pregel.py, motif.py, stackexchange.py
- Wrap motif.py and stackexchange.py in main() with if __name__ guard
- All tutorial scripts now use consistent --data-dir for data path

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
- Use regex with negative lookahead in generate_diagrams.py to avoid
  mutating multi-digit stroke-width values (e.g., stroke-width:10)
- Remove neo4j tutorial (not part of this PR per reviewer)
- Update data-setup tutorial to document --data-dir CLI option
- Update pregel and motif tutorial docs to use data_dir path pattern

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
@rjurney
Copy link
Collaborator Author

rjurney commented Mar 14, 2026

@SemyonSinchenko I updated https://github.com/rjurney/graphframes/tree/rjurney/pypi-tutorials fixing all the issues save one I asked you about, but it isn't updating this PR or running tests... I guess it is slack today.

@rjurney
Copy link
Collaborator Author

rjurney commented Mar 14, 2026

Okay, there it goes!

cursoragent and others added 10 commits March 14, 2026 03:07
The Jupyter notebooks in python/graphframes/tutorials/notebooks/ reference
images via relative paths like ../img/... which resolve to
python/graphframes/tutorials/img/. However, the actual images live in
docs/src/img/.

Add a symlink python/graphframes/tutorials/img -> ../../../docs/src/img
so all 15 image references in the notebooks (Network_Motif_Finding.ipynb
and Pregel.ipynb) resolve correctly without duplicating files.

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
Commit ec1b084 accidentally removed the connected components diagram
from the Connected Components section of the Pregel tutorial. The SVG
file exists at docs/src/img/pregel-diagrams/pregel-connected-components.svg
and should be shown like all other example sections in the tutorial.

The original commit was mislabeled as removing a 'Mermaid diagram' but
actually removed a rendered SVG <img> tag.

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
The connected-components SVG renders with only 2 nodes and 0 edges
(broken Mermaid render), making it useless in the tutorial. Reverting
the prior mistaken restore; the image reference should remain absent
from the Connected Components section.

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
…syntax

Chained undirected edges (A --- B --- C) fail to render in LR layout
with some Mermaid versions, producing a broken SVG with only 2 nodes.
Fix by using explicit two-node edge lines instead.

Also restore the connected-components diagram in the docs tutorial
(04-pregel-tutorial.md) which was previously removed due to the broken
SVG. With the correct rendering (18 nodes, 3 supersteps), the diagram
is now suitable for the tutorial.

Update generate_diagrams.py to use the fixed Mermaid syntax with a
comment explaining the workaround.

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
In Jupyter notebooks and docs, <figcaption> rendered as unstyled
block text indistinguishable from surrounding headings/paragraphs.

- Add a <style> block at the top of each notebook's first markdown
  cell to globally apply caption styling (smaller, italic, grey,
  centered) via the figcaption CSS selector
- Add equivalent inline style attributes to every <figcaption> in
  the Laika docs markdown files (04-pregel-tutorial.md and
  02-motif-tutorial.md) for the static site build

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
These SVG files were accidentally generated by running generate_diagrams.py
from the python/ working directory (causing relative output path to resolve
as python/docs/src/img/ instead of docs/src/img/).

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
Replace relative 'docs/src/img/...' path with one resolved from __file__
so the script writes to the correct location regardless of cwd.

Also includes minor SVG whitespace normalization from mmdc 0.4.1 re-render
(stroke-width:3px vs stroke-width: 3px — functionally identical).

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
The previous approach used per-element inline style= attributes on the 15
figcaptions across the two tutorial docs files. That only covered existing
files — any future doc page with a figcaption would be unstyled.

Replace with a proper Laika-level fix:
- Add docs/src/helium/custom.css with the figcaption rule
- Wire it into the Helium theme via .site.internalCSS() in LaikaCustoms.scala
  (required in Laika 1.0+; the old automatic CSS directory scanning is gone)
- Strip the now-redundant inline style= attributes from both tutorial docs

The <style> block in the notebooks (Jupyter rendering) is unchanged — it
remains the correct mechanism for notebooks since they do not go through Laika.

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
…s/97K edges

Every mention of the ~130K nodes / ~97K edges figures now names the specific
Stack Exchange dataset (stats.meta.stackexchange.com) so readers know exactly
which archive produces those numbers.

Updated in:
- docs/src/03-tutorials/04-pregel-tutorial.md
- docs/src/03-tutorials/02-motif-tutorial.md
- python/graphframes/tutorials/notebooks/Pregel.ipynb (cell 2)
- python/graphframes/tutorials/notebooks/Network_Motif_Finding.ipynb (cell 0)

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
<center> is a deprecated element that centers relative to its nearest
containing block. In Helium's layout that block spans the full page
(content column + both sidebars), so figures appeared shifted right.

Fix by:
- Adding figure { margin: 1em 0; text-align: center; } and
  figure img { max-width: 100%; height: auto; } to custom.css and
  the notebooks' <style> block — centering is now done via CSS
  within the content column's own containing block
- Removing all 30 <center>/<\/center> wrappers from the two tutorial
  docs and both notebooks; plain <figure> blocks are sufficient

Co-authored-by: Russell Jurney <rjurney@users.noreply.github.com>
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.21%. Comparing base (f1db6f4) to head (957fb9c).
⚠️ Report is 5 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #809      +/-   ##
==========================================
- Coverage   84.94%   81.21%   -3.74%     
==========================================
  Files          68       77       +9     
  Lines        3507     4263     +756     
  Branches      453      488      +35     
==========================================
+ Hits         2979     3462     +483     
- Misses        528      801     +273     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants