Add research autolearn, counterfactual twin, and deep backfill tooling#13
Add research autolearn, counterfactual twin, and deep backfill tooling#13ArielB1980 merged 8 commits intomainfrom
Conversation
Automate post-cycle learning overrides, campaign stop gating, and blocker/funnel diagnostics while wiring replay research and live decision paths with richer telemetry and tests so non-baseline discovery can be debugged deterministically. Made-with: Cursor
Introduce deterministic decision-tape scoring utilities for candidate comparisons and add focused tests so promotion decisions can be grounded in measured utility uplift. Made-with: Cursor
Provide a CoinAPI OHLCV client and deep backfill driver so missing higher-timeframe history can be filled reliably when Kraken coverage is incomplete. Made-with: Cursor
Validate that symbol override resolution applies only to matching symbols and preserves base config values for non-target symbols. Made-with: Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| # Stop processing (no new entries while kill switch is active) | ||
| return | ||
| # Stop processing (no new entries while kill switch is active) | ||
| return |
There was a problem hiding this comment.
Replay bypass skips kill switch return, allowing continued tick processing
High Severity
When replay_fail_open is True (DATA_FAILURE kill switch in replay mode), the code logs a warning but does not return. Execution falls through past the entire kill switch block into normal tick processing (order timeouts, data health checks, analysis, etc.). The original code always returned after handling an active kill switch. For non-DATA_FAILURE kill switch reasons in replay mode, the else branch handles them and returns — but the DATA_FAILURE path silently continues. This means replay ticks proceed with an active kill switch, which contradicts the safety invariant that no new entries occur while the kill switch is active.
| requested_leverage=requested_leverage, | ||
| spec_summary=None, | ||
| ) | ||
| continue |
There was a problem hiding this comment.
NO_SPEC bypass proceeds without spec causing downstream failures
Medium Severity
When _replay_relaxed_signal_gates is True and no instrument spec is found, the code logs and continues instead of skipping. However, spec remains None and is used downstream for step size calculation, min-size enforcement, and position sizing. Code later references spec.size_step, spec.tick_size, etc., which will raise AttributeError on None. The bypass only logs — it doesn't provide a fallback spec object.
| return raw | ||
| cached = self._symbol_resolution_cache.get(raw) | ||
| if cached: | ||
| return cached |
There was a problem hiding this comment.
Symbol resolution cache returns empty string as cache miss
Low Severity
_resolve_market_symbol checks if cached: which treats an empty string as a cache miss. While the early return for empty raw prevents empty strings from being cached, if a symbol resolves to a non-empty string that's falsy in some edge case, the cache would be bypassed. More importantly, if any resolved symbol happens to be a string that Python considers falsy, the cache lookup would fail and re-resolve every time, causing a performance issue.
| -int(r["candidate_open_count"]), | ||
| ), | ||
| reverse=True, | ||
| ) |
There was a problem hiding this comment.
Counterfactual twin sort order inverts candidate ranking
Medium Severity
evaluate_candidate_batch sorts by a tuple including -int(r["candidate_open_count"]) as the third key with reverse=True. The negative sign combined with reverse sort means candidates with more opens rank lower (double negation). If the intent is to rank higher-open-count candidates higher when uplift and delta are tied, the negation produces the opposite order.
Restore kill-switch early return semantics in replay fail-open mode, prevent NO_SPEC ablation from dereferencing missing specs, correct counterfactual candidate tie-break sorting, harden symbol-resolution cache lookup, and ensure GitHub Actions installs pytest before running unit tests. Made-with: Cursor
Install pytest-asyncio in replay gate workflow so async unit tests collect successfully under GitHub Actions. Made-with: Cursor
Use safe decimal coercion for strategy config values, restore stop order type used by missing-stop placement, normalize precision drift in SMC golden reasoning comparisons, and complete TP backfill test fixtures with explicit signal cooldown config. Made-with: Cursor
Normalize numeric literals in SMC reasoning to 9 decimal places so snapshot comparisons are stable across minor platform floating-point representation differences. Made-with: Cursor


Summary
Test plan
uv run pytest tests/unit/test_research_filter_report.py tests/unit/test_research_autolearn.py tests/unit/test_research_campaign_gate.py tests/unit/test_research_evaluator_eligibility.py tests/unit/test_research_evaluator_replay_timeout.py tests/unit/test_counterfactual_twin.pygit diff --stat main...HEADfor scoped changesMade with Cursor
Note
Medium Risk
Medium risk because it touches core trading/replay pathways (
LiveTrading,ExecutionGateway, replay exchange sim) with new replay-mode bypasses and logging/instrumentation that could affect behavior if accidentally enabled in non-replay environments.Overview
Adds a new continuous research toolchain: scripts to run campaign-level stop gating, auto-generate next-cycle env overrides from prior runs, summarize ablation batches, and produce blocker/funnel reports from run logs and recorded decision events.
Introduces counterfactual decision-tape instrumentation by emitting
COUNTERFACTUAL_DECISION/COUNTERFACTUAL_ACTIONevents with stabledecision_ids, plus a newcounterfactual_twinmodule to stitch these events into a deterministic tape and score parameter overrides via utility-uplift ranking.Expands the replay/research harness for higher determinism and throughput: symbol resolution across spot/futures naming, explicit order-update polling, forced flatten-at-end, optional DB mocking, relaxed replay gates/limits via env flags, and additional replay metrics (e.g., paused ticks, trades closed).
Adds CoinAPI-backed deep backfill support: a new
CoinAPIClient,DataAcquisition.fetch_spot_historical(..., source=...)routing, and adeep_backfill_daily.pyutility for large lookback candle ingestion; also updates research control/nightly scripts to default to replay-oriented settings and add continuous-daemon commands.Written by Cursor Bugbot for commit 31003e9. This will update automatically on new commits. Configure here.