fix(research): unblock optimizer, fix phantom trades, parallelize#16
Merged
ArielB1980 merged 15 commits intomainfrom Apr 5, 2026
Merged
fix(research): unblock optimizer, fix phantom trades, parallelize#16ArielB1980 merged 15 commits intomainfrom
ArielB1980 merged 15 commits intomainfrom
Conversation
…tion, add bounds
Root cause: REPLAY_OVERRIDE_* env vars exported by the continuous daemon
were overriding the very config parameters research was trying to mutate,
making all candidates behave identically to baseline ("uninformative surface"
after 4 probes, every symbol, every cycle — 1,503 cycles with 0 results).
Fix 1: Strip REPLAY_OVERRIDE_* env vars during replay evaluation so that
config_overrides from the harness actually reach the strategy engine.
Fix 2: Raise uninformative_surface_probe_count from 4 to 12 so the harness
tries more mutations before giving up on a symbol.
Fix 3: When baseline produces 0 trades, use aggressive gate-lowering
exploration for the first 6 iterations to find the signal region.
Fix 5: Add PARAMETER_BOUNDS with min/max per parameter to prevent
degenerate values. _mutate_params now clamps to bounds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phantom trade fixes (6): - Add 30-min dedup guard in phantom position import to prevent restart-loop duplicates - Add 6-hour age gate on trade recording to skip stale historical artifacts - Re-classify phantom positions after Case D purge in production takeover - Mark stale zero-qty positions as trade_recorded on persistence load - Isolate replay harness position DB via POSITION_PERSISTENCE_PATH env var - Harden systemd restart policy (on-failure, 30s delay, burst limits) Research pipeline fixes (4): - Widen fib_proximity_bps 60→120 bps (max 80→160) — singular signal chokepoint - Lower neutral score thresholds (tight_smc 65→55, wide_structure 60→50) for environments without 200-day EMA data - Fix instrument spec registry clobbering — merge duplicate _index() calls into single call combining both BTC and XBT-aliased spec formats - Replace synthetic min_size=1 (whole unit) with realistic per-asset minimums (BTC: 0.0001, ETH: 0.001, etc.) so position sizing works with small equity Research allowlist additions: - strategy.fib_proximity_bps, strategy.min_score_tight_smc_neutral, strategy.min_score_wide_structure_neutral now optimizable Control test script added for March 12 validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The research optimizer was only tuning 18 entry gate parameters while the core problem was fundamentally bad exits: 6.5% win rate, 100% stop-loss exits, 36-second average hold times. Stops were 0.15-0.30 ATR (hit by noise instantly), and TP/trailing/risk params were completely locked. Changes: - Expand allowlist from 18 → 33 optimizable parameters - Add exit mechanics: TP1/TP2 R-multiples, close percentages, runner %, trailing stop ATR multiplier - Add risk sizing: risk_per_trade_pct, target_leverage - Add cost constraints: tight_smc_cost_cap_bps, min_rr_multiple, fee_edge_multiple_k - Widen stop loss bounds significantly (tight_smc max from 1.5 → 3.0 ATR, wide_structure max from 2.5 → 4.0 ATR) - Handle Optional config sections (multi_tp) in param reader with bound midpoint fallback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add important flag to _notify(). Default is quiet (log only). Only these events send to Telegram: new best candidate, convergence, run completed, replay gate results, and promotion queued. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eneration 1. Wrong config section in allowlist: tight_smc_cost_cap_bps, tight_smc_min_rr_multiple, fee_edge_multiple_k live on RiskConfig not StrategyConfig. Every replay crashed with "StrategyConfig has no field" → -999% sentinel → all symbols skipped. Fixed prefix from strategy.* to risk.*. 2. Holdout sentinel (-999%) caused immediate symbol skip: when holdout window had 0 trades, evaluator returned -999% sentinel which triggered _is_non_informative_baseline → optimizer never tried any mutations. Changed to return 0% with penalty score so optimizer can explore looser parameters. 3. partial_data_non_comparable produced -999% sentinel: symbols with incomplete 15m coverage (30 days vs 120 requested) hit this path. Changed to return 0-trade metrics instead of hard failure. 4. Score thresholds too high for neutral bias: without EMA200 (needs 200 candles, only 120 available), all signals get neutral bias → HTF alignment capped at 10, EMA slope = 0 → max realistic score ~60. Lowered neutral thresholds: tight_smc_neutral 55→35, wide_structure_neutral 50→30. 5. 4H_STRUCTURE_REQUIRED hard gate blocked ~25% of signals: 1H structure fallback existed in code but was disabled. Enabled structure_fallback_enabled=true with reduced penalty (15→5 points). Result: research now generates 9+ trades per baseline eval (was 0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The evaluator passed disable_db_mock=True when DATABASE_URL was set, causing the replay harness to write simulated trades directly into the production trades table. This produced 4,517 phantom trades with <60s duration and -$54K fake PnL. Fix: always use the DB mock during research replay. Replay reads candle data from CSV files, not the DB — it never needs real DB writes. Also cleaned 4,517 phantom trade records from production DB. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split symbol optimization across N-1 workers (one per CPU minus OS/collector). Each worker gets isolated state, logs, and output dirs. Results are merged after all workers complete so post-run hooks work unchanged. On 4-CPU droplet: 3 workers × 2 symbols = ~3x faster than sequential. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move StartLimitIntervalSec/StartLimitBurst to [Unit] section (correct placement per systemd docs). Add one-shot trading_review.py diagnostic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
uv sync --dev uses [dependency-groups] not [project.optional-dependencies]. pytest was only in the latter, causing CI to fail with "Failed to spawn: pytest". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dels shim These types were referenced by position_manager_v2.py and position_evaluator.py but never defined in the shim module, causing ImportError in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The parallel worker monitoring loop was dying every ~10 minutes because `[[ -n "" ]] && VAR="val"` returns exit 1 when the test fails, which under `set -e` kills the entire daemon. Changed && chains to if/then, wrapped loop in set +e, and added error tolerance to state merge step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three changes to fix replay workers using only 30% CPU:
1. Disable OHLCV fetcher rate limiting in replay (semaphore 8→1000,
min_delay 200ms→0, retries 3→1) — exchange sim is in-memory, no
real API to throttle against. This was the main bottleneck (95% of
time spent in epoll_wait).
2. Set LOG_LEVEL=WARNING for research workers — INFO logging was
producing 280MB/worker of I/O, choking disk and CPU.
3. Support ${VAR:-default} syntax in config.yaml env var expansion.
Result: CPU utilization 16%→71%, per-worker 30%→75%, load 1.3→3.3.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes to improve research result quality: 1. Default objective mode net_pnl_only → risk_adjusted. The composite score includes drawdown penalty (-0.8), Sharpe/Sortino (0.35), and win rate (0.1) — critical for avoiding high-variance curve fits that would fail live (19% win rate baseline). 2. Promotion minimum trades 10 → 20. With 120-day windows and 30% holdout, 10 trades is too few for statistical confidence. Also fixed .env on production server: - RESEARCH_CONT_WINDOW_OFFSETS: 0 → 0,90,180 (3 walk-forward windows) - RESEARCH_CONT_REPLAY_TIMEFRAMES: added 1m back Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two new post-cycle diagnostics that test whether SMC signal logic has genuine edge, independent of parameter optimization: 1. Signal Directional Accuracy — intercepts every signal during replay and measures if price moves in the predicted direction at 1h/4h/24h. Reports hit rate, p-value (binomial test vs 50%), and breakdown by setup type (OB/FVG/BOS/TREND) and direction. 2. Random Entry Baseline — runs N replay trials with random entries at the same frequency as the real strategy but using the same risk management stack. If random entries produce similar returns, the signals have no edge. Both run automatically after each research cycle (alongside the existing counterfactual twin) with timeout protection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
strategy.*→risk.*), holdout sentinel (-999%), partial data sentinel, score thresholds too high for neutral bias, 4H structure gate with no fallbackDATABASE_URLwas set (disable_db_mock=Trueon prod). Changed to always mock. Cleaned 4,517+ phantom trades from production DB.Commits
fix(research): unblock optimizer — strip env overrides, widen exploration, add boundschore: add ruff as dev dependencyfix: eliminate phantom trades and unblock research signal generationfeat(research): expand optimizer to full entry+exit+risk parameter spacefeat(research): quiet Telegram — only notify on meaningful resultsfix(research): unblock optimizer — 5 root causes killing all signal generationfix(research): stop replay phantom trades polluting production DBfeat(research): parallelize research across available CPUsTest plan
🤖 Generated with Claude Code