Skip to content

Add diagram anchor integrity check (EDUENG-613)#23138

Open
ebembi-crdb wants to merge 1 commit intomainfrom
edueng-613-validate-diagram-anchors
Open

Add diagram anchor integrity check (EDUENG-613)#23138
ebembi-crdb wants to merge 1 commit intomainfrom
edueng-613-validate-diagram-anchors

Conversation

@ebembi-crdb
Copy link
Copy Markdown
Contributor

Summary

Adds a script and GitHub Actions workflow that catch broken sql-grammar.html#anchor references in SQL diagram files before they block production builds.

How it works:

  • Scans doc files for {% remote_include ... crdb_branch_name .../grammar_svg/DIAGRAM.html %} tags
  • Fetches each referenced diagram HTML from cockroachdb/generated-diagrams
  • Verifies every sql-grammar.html#ANCHOR link inside it resolves against stmt_block.html on the same branch

Files added:

  • .github/scripts/validate_diagram_anchors.py — stdlib-only Python script, no pip installs required
  • .github/workflows/validate-diagram-anchors.yml — triggers on *.md changes in PRs (changed files only) and runs a full scan daily at 07:15 UTC

On PR failure: posts/updates a bot comment with the broken diagram, branch, and anchor, plus which doc files reference it. Blocks merge.
On scheduled failure: opens a GitHub issue (or updates an existing open one) with label sql-diagram-validation.

Context

Addresses EDUENG-613, part of the follow-up to the production build outage on 2026-01-29 discussed in #docs-site-status.

Root cause of that outage: show_statement_hints.html was added with a link to sql-grammar.html#opt_with_show_hints_options, but that anchor didn't yet exist in stmt_block.html on release-26.1. This check would have caught it before merge.

The branch existence check (EDUENG-614) is in a separate PR: #23137

Test plan

  • Manually run python .github/scripts/validate_diagram_anchors.py src/current/v26.1/show-statement-hints.md from repo root
  • Verify the workflow triggers on a .md change in a test PR and only scans changed files
  • Confirm the daily schedule appears in the Actions tab after merge

Adds a script and daily workflow that scan doc files for remote_include tags
pulling from cockroachdb/generated-diagrams grammar_svg, fetch each referenced
diagram HTML, and verify that every sql-grammar.html#ANCHOR link inside it
resolves against stmt_block.html on the same branch.

This catches the exact failure that blocked production builds on 2026-01-29:
show_statement_hints.html referenced sql-grammar.html#opt_with_show_hints_options
but that anchor did not yet exist in stmt_block.html on release-26.1.

Files added:
- .github/scripts/validate_diagram_anchors.py
- .github/workflows/validate-diagram-anchors.yml
@ebembi-crdb ebembi-crdb requested a review from a team as a code owner March 26, 2026 14:03
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 26, 2026

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 148e88f
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/69c53cc0f379ba0007df1cba

@github-actions
Copy link
Copy Markdown

Files changed:

  • .github/scripts/validate_diagram_anchors.py
  • .github/workflows/validate-diagram-anchors.yml

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 26, 2026

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 148e88f
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/69c53cc021ff7a0008efae01

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 26, 2026

Netlify Preview

Name Link
🔨 Latest commit 148e88f
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/69c53cc071bdce0008dddb75
😎 Deploy Preview https://deploy-preview-23138--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

import urllib.request
from pathlib import Path
from typing import Optional

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script fetches diagram and stmt_block.html via raw.githubusercontent.com and adds Authorization: Bearer if GITHUB_TOKEN is present. The raw host (raw.githubusercontent.com) does not use GitHub API auth tokens the same way the API does; adding the header is benign but may not help for private repos or rate limits. For authenticated/robust fetches:

Use the GitHub REST API endpoint:

GET /repos/{owner}/{repo}/contents/{path}?ref={branch}

which returns base64-encoded content and works with Authorization: Bearer <GITHUB_TOKEN>.

Or check that the raw URLs will not be rate-limited for your usage pattern (daily + PR checks is probably fine).

# ---------------------------------------------------------------------------
# Parsing helpers
# ---------------------------------------------------------------------------

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_stmt_block_anchors currently uses re.findall(r'\bid="'["']', content) which works in most cases but is fragile for edge HTML (e.g., ids broken across attributes/newlines, or presence of HTML comments/inline scripts). Since you explicitly avoid external deps, consider using Python’s stdlib html.parser to reliably collect id attributes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants