Add research autolearn, counterfactual twin, and deep backfill tooling by ArielB1980 · Pull Request #13 · ArielB1980/Kbot

ArielB1980 · 2026-03-18T22:32:58Z

Summary

add continuous research autolearn/campaign-gate/filter/funnel tooling and wire richer replay/live gate instrumentation to make blocker diagnosis deterministic
add counterfactual twin utilities and tests for decision-tape uplift comparison and batch ranking of candidates
add CoinAPI-backed deep backfill path plus related data acquisition support to improve research candle completeness
extend targeted unit coverage for research evaluator eligibility/timeout behavior and per-symbol config resolver behavior

Test plan

uv run pytest tests/unit/test_research_filter_report.py tests/unit/test_research_autolearn.py tests/unit/test_research_campaign_gate.py tests/unit/test_research_evaluator_eligibility.py tests/unit/test_research_evaluator_replay_timeout.py tests/unit/test_counterfactual_twin.py
reviewed git diff --stat main...HEAD for scoped changes
run wider integration suite before merge (recommended)

Made with Cursor

Note

Medium Risk
Medium risk because it touches core trading/replay pathways (LiveTrading, ExecutionGateway, replay exchange sim) with new replay-mode bypasses and logging/instrumentation that could affect behavior if accidentally enabled in non-replay environments.

Overview
Adds a new continuous research toolchain: scripts to run campaign-level stop gating, auto-generate next-cycle env overrides from prior runs, summarize ablation batches, and produce blocker/funnel reports from run logs and recorded decision events.

Introduces counterfactual decision-tape instrumentation by emitting COUNTERFACTUAL_DECISION/COUNTERFACTUAL_ACTION events with stable decision_ids, plus a new counterfactual_twin module to stitch these events into a deterministic tape and score parameter overrides via utility-uplift ranking.

Expands the replay/research harness for higher determinism and throughput: symbol resolution across spot/futures naming, explicit order-update polling, forced flatten-at-end, optional DB mocking, relaxed replay gates/limits via env flags, and additional replay metrics (e.g., paused ticks, trades closed).

Adds CoinAPI-backed deep backfill support: a new CoinAPIClient, DataAcquisition.fetch_spot_historical(..., source=...) routing, and a deep_backfill_daily.py utility for large lookback candle ingestion; also updates research control/nightly scripts to default to replay-oriented settings and add continuous-daemon commands.

^{Written by Cursor Bugbot for commit 31003e9. This will update automatically on new commits. Configure here.}

Automate post-cycle learning overrides, campaign stop gating, and blocker/funnel diagnostics while wiring replay research and live decision paths with richer telemetry and tests so non-baseline discovery can be debugged deterministically. Made-with: Cursor

Introduce deterministic decision-tape scoring utilities for candidate comparisons and add focused tests so promotion decisions can be grounded in measured utility uplift. Made-with: Cursor

Provide a CoinAPI OHLCV client and deep backfill driver so missing higher-timeframe history can be filled reliably when Kraken coverage is incomplete. Made-with: Cursor

Validate that symbol override resolution applies only to matching symbols and preserves base config values for non-target symbols. Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 4 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-18T22:34:37Z

-            # Stop processing (no new entries while kill switch is active)
-            return
+                # Stop processing (no new entries while kill switch is active)
+                return


Replay bypass skips kill switch return, allowing continued tick processing

High Severity

When replay_fail_open is True (DATA_FAILURE kill switch in replay mode), the code logs a warning but does not return. Execution falls through past the entire kill switch block into normal tick processing (order timeouts, data health checks, analysis, etc.). The original code always returned after handling an active kill switch. For non-DATA_FAILURE kill switch reasons in replay mode, the else branch handles them and returns — but the DATA_FAILURE path silently continues. This means replay ticks proceed with an active kill switch, which contradicts the safety invariant that no new entries occur while the kill switch is active.

cursor · 2026-03-18T22:34:37Z

+                            requested_leverage=requested_leverage,
+                            spec_summary=None,
+                        )
+                        continue


NO_SPEC bypass proceeds without spec causing downstream failures

Medium Severity

When _replay_relaxed_signal_gates is True and no instrument spec is found, the code logs and continues instead of skipping. However, spec remains None and is used downstream for step size calculation, min-size enforcement, and position sizing. Code later references spec.size_step, spec.tick_size, etc., which will raise AttributeError on None. The bypass only logs — it doesn't provide a fallback spec object.

cursor · 2026-03-18T22:34:37Z

+            return raw
+        cached = self._symbol_resolution_cache.get(raw)
+        if cached:
+            return cached


Symbol resolution cache returns empty string as cache miss

Low Severity

_resolve_market_symbol checks if cached: which treats an empty string as a cache miss. While the early return for empty raw prevents empty strings from being cached, if a symbol resolves to a non-empty string that's falsy in some edge case, the cache would be bypassed. More importantly, if any resolved symbol happens to be a string that Python considers falsy, the cache lookup would fail and re-resolve every time, causing a performance issue.

cursor · 2026-03-18T22:34:37Z

+            -int(r["candidate_open_count"]),
+        ),
+        reverse=True,
+    )


Counterfactual twin sort order inverts candidate ranking

Medium Severity

evaluate_candidate_batch sorts by a tuple including -int(r["candidate_open_count"]) as the third key with reverse=True. The negative sign combined with reverse sort means candidates with more opens rank lower (double negation). If the intent is to rank higher-open-count candidates higher when uplift and delta are tied, the negation produces the opposite order.

Restore kill-switch early return semantics in replay fail-open mode, prevent NO_SPEC ablation from dereferencing missing specs, correct counterfactual candidate tie-break sorting, harden symbol-resolution cache lookup, and ensure GitHub Actions installs pytest before running unit tests. Made-with: Cursor

Install pytest-asyncio in replay gate workflow so async unit tests collect successfully under GitHub Actions. Made-with: Cursor

Use safe decimal coercion for strategy config values, restore stop order type used by missing-stop placement, normalize precision drift in SMC golden reasoning comparisons, and complete TP backfill test fixtures with explicit signal cooldown config. Made-with: Cursor

Normalize numeric literals in SMC reasoning to 9 decimal places so snapshot comparisons are stable across minor platform floating-point representation differences. Made-with: Cursor

ArielB1980 added 4 commits March 18, 2026 23:31

Add counterfactual twin uplift analysis and batch ranking tests.

774de0d

Introduce deterministic decision-tape scoring utilities for candidate comparisons and add focused tests so promotion decisions can be grounded in measured utility uplift. Made-with: Cursor

Add CoinAPI-backed deep backfill for research candle completeness.

073165a

Provide a CoinAPI OHLCV client and deep backfill driver so missing higher-timeframe history can be filled reliably when Kraken coverage is incomplete. Made-with: Cursor

Add config resolver tests for per-symbol strategy and risk overrides.

31003e9

Validate that symbol override resolution applies only to matching symbols and preserves base config values for non-target symbols. Made-with: Cursor

cursor bot reviewed Mar 18, 2026

View reviewed changes

ArielB1980 added 4 commits March 18, 2026 23:38

Fix CI missing pytest-asyncio dependency.

ff674d5

Install pytest-asyncio in replay gate workflow so async unit tests collect successfully under GitHub Actions. Made-with: Cursor

Relax golden reasoning precision tolerance for CI.

8b56b66

Normalize numeric literals in SMC reasoning to 9 decimal places so snapshot comparisons are stable across minor platform floating-point representation differences. Made-with: Cursor

ArielB1980 merged commit ede5d7d into main Mar 19, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add research autolearn, counterfactual twin, and deep backfill tooling#13

Add research autolearn, counterfactual twin, and deep backfill tooling#13
ArielB1980 merged 8 commits intomainfrom
chore/split-research-automation

ArielB1980 commented Mar 18, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 18, 2026

Uh oh!

cursor bot Mar 18, 2026

Uh oh!

cursor bot Mar 18, 2026

Uh oh!

cursor bot Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ArielB1980 commented Mar 18, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 18, 2026

Choose a reason for hiding this comment

Replay bypass skips kill switch return, allowing continued tick processing

Uh oh!

cursor bot Mar 18, 2026

Choose a reason for hiding this comment

NO_SPEC bypass proceeds without spec causing downstream failures

Uh oh!

cursor bot Mar 18, 2026

Choose a reason for hiding this comment

Symbol resolution cache returns empty string as cache miss

Uh oh!

cursor bot Mar 18, 2026

Choose a reason for hiding this comment

Counterfactual twin sort order inverts candidate ranking

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ArielB1980 commented Mar 18, 2026 •

edited by cursor bot

Loading