Skip to content

Replace custom Cypher parser with lance_parser integration#173

Open
AdaWorldAPI wants to merge 9 commits intomainfrom
claude/setup-adaworld-env-zLRz0
Open

Replace custom Cypher parser with lance_parser integration#173
AdaWorldAPI wants to merge 9 commits intomainfrom
claude/setup-adaworld-env-zLRz0

Conversation

@AdaWorldAPI
Copy link
Owner

Summary

This PR removes the custom Cypher parser implementation (src/query/cypher.rs) and integrates the external lance_parser crate for Cypher query parsing. The cypher_bridge module is refactored to work directly with lance_parser AST types instead of intermediate custom types, eliminating redundant parsing logic.

Key Changes

  • Deleted src/query/cypher.rs (1560 lines): Removed custom Cypher tokenizer, parser, and AST types (CypherQuery, Pattern, EdgePattern, NodePattern, etc.)

  • Refactored src/cypher_bridge.rs:

    • Now imports and uses lance_parser::ast types directly (CypherQuery, GraphPattern, MatchClause, NodePattern, etc.)
    • Removed intermediate CypherOp enum and custom value types
    • Changed CypherResult.rows from HashMap<String, CypherValue> to HashMap<String, serde_json::Value>
    • Simplified execution flow: lance_parser::parse_cypher_query()execute_cypher() → BindSpace mutations/reads
    • Implemented execute_match(), build_return_results(), execute_merge_node(), and execute_create_edge_from_path() to work with lance_parser AST
  • Created src/query/cte_builder.rs: Preserved recursive CTE generation logic from deleted cypher.rs for future adaptation to lance_parser types (marked as TODO)

  • Updated src/bin/server.rs: Modified handle_cypher() to accept SharedState parameter and use the new execution pipeline

  • Updated src/storage/database.rs: Changed import from cypher_to_sql to parse_cypher_query and added cte_builder module reference

  • Updated src/query/hybrid.rs: Changed imports to use lance_parser::ast and parse_cypher_query instead of custom types

  • Updated src/query/mod.rs: Removed cypher module, added cte_builder as public module

  • Renamed in src/learning/cam_ops.rs: CypherOp enum renamed to CypherInstruction to avoid naming conflicts

Implementation Details

  • The new execution model is simpler: parse once with lance_parser, then execute directly against BindSpace without intermediate type conversions
  • MATCH queries now scan BindSpace nodes, apply label and WHERE filters, and build RETURN results from matched nodes
  • MERGE/CREATE operations detect write intent from query structure and execute node/edge creation with proper upsert semantics
  • Property values are converted to/from serde_json::Value for JSON serialization compatibility
  • WHERE clause evaluation uses a new evaluate_where() function that works with lance_parser's BooleanExpression type

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN

claude added 2 commits March 13, 2026 04:13
Session 1 of the brain surgery: delete the dead Cypher transpiler
(src/query/cypher.rs, 1560 lines) and rewire the entire Cypher
pipeline to use lance_parser (P3) as the single parser.

Changes:
- DELETE src/query/cypher.rs (P1 — hand-rolled transpiler, dead code)
- SAVE CTE generator logic to src/query/cte_builder.rs (standalone)
- REWRITE src/cypher_bridge.rs: remove CypherOp/NodeRef/WhereClause/
  CypherValue enums and parse_cypher(); execute_cypher() now takes
  lance_parser::ast::CypherQuery directly. Zero bridge types.
- FIX src/query/hybrid.rs: use P3 CypherQuery + GraphPattern instead
  of P1 CypherParser/PatternElement
- WIRE src/bin/server.rs /cypher endpoint: parse_cypher_query →
  execute_cypher against BindSpace → JSON response (was: transpile only)
- RENAME CypherOp → CypherInstruction in cam_ops.rs (P4, opcode enum)
- FIX imports in lib.rs, query/mod.rs, storage/database.rs

Net: -1628 lines. One Cypher path instead of five.
cargo check --no-default-features --features "simd" passes clean.

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08955e5d16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +92 to +95
if !has_match && !query.reading_clauses.is_empty() {
// Check for node-only patterns that indicate a write
for clause in &query.reading_clauses {
if let ReadingClause::Match(match_clause) = clause {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Execute mutation patterns before has_match short-circuit

execute_cypher sets has_match = true for every ReadingClause::Match, but mutation handling is guarded by if !has_match. Because the mutation loop itself only inspects ReadingClause::Match, that branch is unreachable for any AST containing node/path patterns, so MERGE/CREATE-style patterns never call execute_merge_node or execute_create_edge_from_path and mutation stats stay zero.

Useful? React with 👍 / 👎.

Comment on lines +119 to +123
let sql = build_recursive_cte(
start_label,
&rel_types,
1,
5,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Cypher semantics in Database::cypher SQL generation

Database::cypher now ignores most of the parsed query and always emits a fixed recursive traversal (min_hops=1, max_hops=5, no WHERE/projection wiring). For common queries like MATCH (n:Label) RETURN n, this forces an edges traversal and can return unrelated or empty results instead of the requested Cypher result set, which is a functional regression from the prior transpilation path.

Useful? React with 👍 / 👎.

1,
max_depth as u32,
None,
Some(&format!("source.id = '{}'", source_id)),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use a valid alias in butterfly WHERE filtering

detect_butterflies injects source.id = ... into user_where, but the SQL produced by build_recursive_cte only exposes t and n aliases in the final WHERE. This generates SQL referencing an undefined source alias, causing butterfly detection queries to fail at execution time.

Useful? React with 👍 / 👎.

claude added 7 commits March 13, 2026 04:47
n8n-grpc builds protobuf schemas at compile time via prost-build.
The Docker build already has protobuf-compiler installed, but CI
workflows were missing it. Added `apt-get install protobuf-compiler`
to all jobs in ci-master.yml (build, test, lint, miri) and
proof.yml (compile-check + 5 test jobs).

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
Replaces direct lance::Dataset usage with lancedb SDK throughout:
- Cargo.toml: lance dep → lancedb dep, clean vendor patch comments
- storage/lance.rs: LanceStore uses lancedb::Connection + Table
- storage/lance_persistence.rs: all methods now &self (no mut needed),
  connect() returns fresh Connection each call
- query/datafusion.rs: register_lance_tables uses lancedb SDK
- lib.rs: Error impl for lancedb::Error
- database.rs: Database::memory() now async
- python/mod.rs: memory() handles async via block_on

Versioning comes free: every write creates a new MVCC version.
Both feature combos pass: --features "simd,lancedb" and --features "simd".

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
Session 2 — make the private SPO substrate accessible within the crate:

- src/spo/mod.rs: `mod spo` → `pub(crate) mod spo`
- src/spo/spo.rs: SPOCrystal, OrthogonalCodebook, Triple, Qualia now
  pub(crate) with key methods (new, insert, resonate_spo, query_*)
- core/fingerprint.rs: add dot_bipolar() + project_out() (ported from
  spo.rs lines 109-138, deterministic LCG instead of rand dependency)
- src/spo/crystal_api.rs: thin re-export layer for single-import access
- src/core/simd.rs: add hamming_scalar() reference implementation +
  re-export hamming_distance(), fixing proof_level_a_gaps CI failure

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
MetacognitiveSubstrate.crystal is [[[Fingerprint; 5]; 5]; 5] = 125 × 2048
bytes = 256KB. When constructed on the stack in CodebookTrainer::new(),
this overflows the default 8MB thread stack (especially with 4 test threads
each creating their own trainer).

Fix: Box<[[[Fingerprint; 5]; 5]; 5]> moves the allocation to the heap.

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
- Move remaining 16384-element stack arrays to heap (vec![]) in
  codebook_training, deepnsm_integration, meta_resonance, nsm_substrate
- Add hamming_scalar reference function to rustynum_accel
- Fix proof_level_a_gaps test import (core::simd → core)
- Remove commented-out nightly portable_simd feature gate from lib.rs
- All SIMD now delegates to rustynum stable (runtime AVX2 fallback)

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
- Remove redundant hamming_scalar from rustynum_accel (simd.rs has it)
- Fix test import to use core::simd::{hamming_distance, hamming_scalar}
- Remove hamming_scalar re-export from core/mod.rs
- rustynum handles AVX2→scalar fallback silently at runtime

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
- Remove entire Miri job from ci-master.yml (required nightly toolchain)
- Remove miri from CI summary needs and output
- Pin all workflow toolchains from 1.93.0 to 1.93.1
- rustynum handles SIMD fallback on stable, no nightly needed

https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants