Replace custom Cypher parser with lance_parser integration#173
Replace custom Cypher parser with lance_parser integration#173AdaWorldAPI wants to merge 9 commits intomainfrom
Conversation
Session 1 of the brain surgery: delete the dead Cypher transpiler (src/query/cypher.rs, 1560 lines) and rewire the entire Cypher pipeline to use lance_parser (P3) as the single parser. Changes: - DELETE src/query/cypher.rs (P1 — hand-rolled transpiler, dead code) - SAVE CTE generator logic to src/query/cte_builder.rs (standalone) - REWRITE src/cypher_bridge.rs: remove CypherOp/NodeRef/WhereClause/ CypherValue enums and parse_cypher(); execute_cypher() now takes lance_parser::ast::CypherQuery directly. Zero bridge types. - FIX src/query/hybrid.rs: use P3 CypherQuery + GraphPattern instead of P1 CypherParser/PatternElement - WIRE src/bin/server.rs /cypher endpoint: parse_cypher_query → execute_cypher against BindSpace → JSON response (was: transpile only) - RENAME CypherOp → CypherInstruction in cam_ops.rs (P4, opcode enum) - FIX imports in lib.rs, query/mod.rs, storage/database.rs Net: -1628 lines. One Cypher path instead of five. cargo check --no-default-features --features "simd" passes clean. https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08955e5d16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !has_match && !query.reading_clauses.is_empty() { | ||
| // Check for node-only patterns that indicate a write | ||
| for clause in &query.reading_clauses { | ||
| if let ReadingClause::Match(match_clause) = clause { |
There was a problem hiding this comment.
Execute mutation patterns before
has_match short-circuit
execute_cypher sets has_match = true for every ReadingClause::Match, but mutation handling is guarded by if !has_match. Because the mutation loop itself only inspects ReadingClause::Match, that branch is unreachable for any AST containing node/path patterns, so MERGE/CREATE-style patterns never call execute_merge_node or execute_create_edge_from_path and mutation stats stay zero.
Useful? React with 👍 / 👎.
| let sql = build_recursive_cte( | ||
| start_label, | ||
| &rel_types, | ||
| 1, | ||
| 5, |
There was a problem hiding this comment.
Preserve Cypher semantics in
Database::cypher SQL generation
Database::cypher now ignores most of the parsed query and always emits a fixed recursive traversal (min_hops=1, max_hops=5, no WHERE/projection wiring). For common queries like MATCH (n:Label) RETURN n, this forces an edges traversal and can return unrelated or empty results instead of the requested Cypher result set, which is a functional regression from the prior transpilation path.
Useful? React with 👍 / 👎.
| 1, | ||
| max_depth as u32, | ||
| None, | ||
| Some(&format!("source.id = '{}'", source_id)), |
There was a problem hiding this comment.
Use a valid alias in butterfly WHERE filtering
detect_butterflies injects source.id = ... into user_where, but the SQL produced by build_recursive_cte only exposes t and n aliases in the final WHERE. This generates SQL referencing an undefined source alias, causing butterfly detection queries to fail at execution time.
Useful? React with 👍 / 👎.
n8n-grpc builds protobuf schemas at compile time via prost-build. The Docker build already has protobuf-compiler installed, but CI workflows were missing it. Added `apt-get install protobuf-compiler` to all jobs in ci-master.yml (build, test, lint, miri) and proof.yml (compile-check + 5 test jobs). https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
Replaces direct lance::Dataset usage with lancedb SDK throughout: - Cargo.toml: lance dep → lancedb dep, clean vendor patch comments - storage/lance.rs: LanceStore uses lancedb::Connection + Table - storage/lance_persistence.rs: all methods now &self (no mut needed), connect() returns fresh Connection each call - query/datafusion.rs: register_lance_tables uses lancedb SDK - lib.rs: Error impl for lancedb::Error - database.rs: Database::memory() now async - python/mod.rs: memory() handles async via block_on Versioning comes free: every write creates a new MVCC version. Both feature combos pass: --features "simd,lancedb" and --features "simd". https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
Session 2 — make the private SPO substrate accessible within the crate: - src/spo/mod.rs: `mod spo` → `pub(crate) mod spo` - src/spo/spo.rs: SPOCrystal, OrthogonalCodebook, Triple, Qualia now pub(crate) with key methods (new, insert, resonate_spo, query_*) - core/fingerprint.rs: add dot_bipolar() + project_out() (ported from spo.rs lines 109-138, deterministic LCG instead of rand dependency) - src/spo/crystal_api.rs: thin re-export layer for single-import access - src/core/simd.rs: add hamming_scalar() reference implementation + re-export hamming_distance(), fixing proof_level_a_gaps CI failure https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
MetacognitiveSubstrate.crystal is [[[Fingerprint; 5]; 5]; 5] = 125 × 2048 bytes = 256KB. When constructed on the stack in CodebookTrainer::new(), this overflows the default 8MB thread stack (especially with 4 test threads each creating their own trainer). Fix: Box<[[[Fingerprint; 5]; 5]; 5]> moves the allocation to the heap. https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
- Move remaining 16384-element stack arrays to heap (vec![]) in codebook_training, deepnsm_integration, meta_resonance, nsm_substrate - Add hamming_scalar reference function to rustynum_accel - Fix proof_level_a_gaps test import (core::simd → core) - Remove commented-out nightly portable_simd feature gate from lib.rs - All SIMD now delegates to rustynum stable (runtime AVX2 fallback) https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
- Remove redundant hamming_scalar from rustynum_accel (simd.rs has it)
- Fix test import to use core::simd::{hamming_distance, hamming_scalar}
- Remove hamming_scalar re-export from core/mod.rs
- rustynum handles AVX2→scalar fallback silently at runtime
https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
- Remove entire Miri job from ci-master.yml (required nightly toolchain) - Remove miri from CI summary needs and output - Pin all workflow toolchains from 1.93.0 to 1.93.1 - rustynum handles SIMD fallback on stable, no nightly needed https://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN
Summary
This PR removes the custom Cypher parser implementation (
src/query/cypher.rs) and integrates the externallance_parsercrate for Cypher query parsing. Thecypher_bridgemodule is refactored to work directly withlance_parserAST types instead of intermediate custom types, eliminating redundant parsing logic.Key Changes
Deleted
src/query/cypher.rs(1560 lines): Removed custom Cypher tokenizer, parser, and AST types (CypherQuery, Pattern, EdgePattern, NodePattern, etc.)Refactored
src/cypher_bridge.rs:lance_parser::asttypes directly (CypherQuery, GraphPattern, MatchClause, NodePattern, etc.)CypherOpenum and custom value typesCypherResult.rowsfromHashMap<String, CypherValue>toHashMap<String, serde_json::Value>lance_parser::parse_cypher_query()→execute_cypher()→ BindSpace mutations/readsexecute_match(),build_return_results(),execute_merge_node(), andexecute_create_edge_from_path()to work with lance_parser ASTCreated
src/query/cte_builder.rs: Preserved recursive CTE generation logic from deletedcypher.rsfor future adaptation to lance_parser types (marked as TODO)Updated
src/bin/server.rs: Modifiedhandle_cypher()to acceptSharedStateparameter and use the new execution pipelineUpdated
src/storage/database.rs: Changed import fromcypher_to_sqltoparse_cypher_queryand addedcte_buildermodule referenceUpdated
src/query/hybrid.rs: Changed imports to uselance_parser::astandparse_cypher_queryinstead of custom typesUpdated
src/query/mod.rs: Removedcyphermodule, addedcte_builderas public moduleRenamed in
src/learning/cam_ops.rs:CypherOpenum renamed toCypherInstructionto avoid naming conflictsImplementation Details
serde_json::Valuefor JSON serialization compatibilityevaluate_where()function that works with lance_parser'sBooleanExpressiontypehttps://claude.ai/code/session_017q9AqMNTojdRGrmp9BX4EN