Skip to content

feat: SQL compliance, robustness, and performance improvements#3

Open
subhayu99 wants to merge 2 commits intomainfrom
subhayu99/explore-improvements
Open

feat: SQL compliance, robustness, and performance improvements#3
subhayu99 wants to merge 2 commits intomainfrom
subhayu99/explore-improvements

Conversation

@subhayu99
Copy link
Owner

Summary

  • SQL Compliance: Add HAVING clause, SELECT DISTINCT, FULL OUTER JOIN, IS NULL/IS NOT NULL support to the custom parser and executor. Fix NULL three-valued logic and empty GROUP BY aggregate defaults (e.g., SELECT COUNT(*) FROM empty.csv now returns 0).
  • Robustness: Deduplicate _evaluate_condition from 5 copies across readers/operators into a single shared utility. Narrow broad except Exception handlers to specific types. Add column-not-found fuzzy matching with "Did you mean?" suggestions.
  • Performance: Enable predicate pushdown for JOIN queries (left table columns only, conservative). Add condition reordering optimizer that evaluates equality conditions before range conditions. Sync version from importlib.metadata.

Test plan

  • All 649 existing tests pass
  • Ruff linter and formatter clean
  • Manually test SELECT DISTINCT city FROM examples/employees.csv
  • Manually test SELECT * FROM examples/employees.csv FULL OUTER JOIN examples/departments.csv ON department = name
  • Manually test SELECT department, COUNT(*) FROM examples/employees.csv GROUP BY department HAVING COUNT(*) > 1
  • Manually test SELECT * FROM examples/employees.csv WHERE nme = 'Alice' (should show fuzzy suggestion)

🤖 Generated with Claude Code

subhayu99 and others added 2 commits March 17, 2026 15:15
- Add HAVING clause, SELECT DISTINCT, FULL OUTER JOIN, IS NULL/IS NOT NULL
- Fix NULL three-valued logic and empty GROUP BY aggregate defaults
- Deduplicate condition evaluation into shared utility (5 copies → 1)
- Narrow broad exception handlers to specific types
- Add predicate pushdown for JOIN queries (left table only)
- Add condition reordering optimizer (equality-first heuristic)
- Add column-not-found fuzzy matching with suggestions
- Sync version from importlib.metadata (was stuck at 0.1.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover SELECT DISTINCT, FULL OUTER JOIN, HAVING, IS NULL/IS NOT NULL,
empty GROUP BY defaults, NULL three-valued logic, column-not-found
fuzzy matching, condition reordering, and parser edge cases.

Also fix pandas executor to support DISTINCT, HAVING, FULL OUTER JOIN,
and IS NULL/IS NOT NULL operators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant