Skip to content

fix(research): unblock optimizer, fix phantom trades, parallelize#16

Merged
ArielB1980 merged 15 commits intomainfrom
ArielB1980/audit-research-value
Apr 5, 2026
Merged

fix(research): unblock optimizer, fix phantom trades, parallelize#16
ArielB1980 merged 15 commits intomainfrom
ArielB1980/audit-research-value

Conversation

@ArielB1980
Copy link
Copy Markdown
Owner

@ArielB1980 ArielB1980 commented Mar 31, 2026

Summary

  • Fix 5 compounding root causes killing all signal generation — config section mismatch (strategy.*risk.*), holdout sentinel (-999%), partial data sentinel, score thresholds too high for neutral bias, 4H structure gate with no fallback
  • Fix phantom trade DB leak — research replay was writing to production trades table when DATABASE_URL was set (disable_db_mock=True on prod). Changed to always mock. Cleaned 4,517+ phantom trades from production DB.
  • Add guided parameter mutation — ParameterMemory tracks per-parameter score deltas, uses softmax-weighted selection and directional bias. Adaptive step decay (annealing) from 100% → 30%. Cross-run warm-start persistence.
  • Quiet Telegram — only notify on meaningful results (new best candidate, run complete, convergence, promotion)
  • Parallelize research across CPUs — splits symbols across N-1 workers (3 on 4-CPU droplet), merges results for post-run hooks. ~3x speedup.

Commits

  1. fix(research): unblock optimizer — strip env overrides, widen exploration, add bounds
  2. chore: add ruff as dev dependency
  3. fix: eliminate phantom trades and unblock research signal generation
  4. feat(research): expand optimizer to full entry+exit+risk parameter space
  5. feat(research): quiet Telegram — only notify on meaningful results
  6. fix(research): unblock optimizer — 5 root causes killing all signal generation
  7. fix(research): stop replay phantom trades polluting production DB
  8. feat(research): parallelize research across available CPUs

Test plan

  • Research daemon running with 3 parallel workers on production (BTC/ETH, SOL/XRP, ADA/LINK)
  • Verified 0 phantom trades written since DB mock fix deployed
  • Verified optimizer is actively generating trades in replay (13 BTC/USD baseline trades, 353 positions across iterations)
  • Cleaned production DB: 232 verified real trades remain, -$15.05 total PnL
  • Live bot stopped pending research results (no capital at risk)

🤖 Generated with Claude Code

ArielB1980 and others added 8 commits March 30, 2026 23:18
…tion, add bounds

Root cause: REPLAY_OVERRIDE_* env vars exported by the continuous daemon
were overriding the very config parameters research was trying to mutate,
making all candidates behave identically to baseline ("uninformative surface"
after 4 probes, every symbol, every cycle — 1,503 cycles with 0 results).

Fix 1: Strip REPLAY_OVERRIDE_* env vars during replay evaluation so that
       config_overrides from the harness actually reach the strategy engine.

Fix 2: Raise uninformative_surface_probe_count from 4 to 12 so the harness
       tries more mutations before giving up on a symbol.

Fix 3: When baseline produces 0 trades, use aggressive gate-lowering
       exploration for the first 6 iterations to find the signal region.

Fix 5: Add PARAMETER_BOUNDS with min/max per parameter to prevent
       degenerate values. _mutate_params now clamps to bounds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phantom trade fixes (6):
- Add 30-min dedup guard in phantom position import to prevent restart-loop duplicates
- Add 6-hour age gate on trade recording to skip stale historical artifacts
- Re-classify phantom positions after Case D purge in production takeover
- Mark stale zero-qty positions as trade_recorded on persistence load
- Isolate replay harness position DB via POSITION_PERSISTENCE_PATH env var
- Harden systemd restart policy (on-failure, 30s delay, burst limits)

Research pipeline fixes (4):
- Widen fib_proximity_bps 60→120 bps (max 80→160) — singular signal chokepoint
- Lower neutral score thresholds (tight_smc 65→55, wide_structure 60→50) for
  environments without 200-day EMA data
- Fix instrument spec registry clobbering — merge duplicate _index() calls into
  single call combining both BTC and XBT-aliased spec formats
- Replace synthetic min_size=1 (whole unit) with realistic per-asset minimums
  (BTC: 0.0001, ETH: 0.001, etc.) so position sizing works with small equity

Research allowlist additions:
- strategy.fib_proximity_bps, strategy.min_score_tight_smc_neutral,
  strategy.min_score_wide_structure_neutral now optimizable

Control test script added for March 12 validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The research optimizer was only tuning 18 entry gate parameters while the
core problem was fundamentally bad exits: 6.5% win rate, 100% stop-loss
exits, 36-second average hold times. Stops were 0.15-0.30 ATR (hit by
noise instantly), and TP/trailing/risk params were completely locked.

Changes:
- Expand allowlist from 18 → 33 optimizable parameters
- Add exit mechanics: TP1/TP2 R-multiples, close percentages, runner %,
  trailing stop ATR multiplier
- Add risk sizing: risk_per_trade_pct, target_leverage
- Add cost constraints: tight_smc_cost_cap_bps, min_rr_multiple,
  fee_edge_multiple_k
- Widen stop loss bounds significantly (tight_smc max from 1.5 → 3.0 ATR,
  wide_structure max from 2.5 → 4.0 ATR)
- Handle Optional config sections (multi_tp) in param reader with bound
  midpoint fallback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add important flag to _notify(). Default is quiet (log only). Only these
events send to Telegram: new best candidate, convergence, run completed,
replay gate results, and promotion queued.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eneration

1. Wrong config section in allowlist: tight_smc_cost_cap_bps, tight_smc_min_rr_multiple,
   fee_edge_multiple_k live on RiskConfig not StrategyConfig. Every replay crashed with
   "StrategyConfig has no field" → -999% sentinel → all symbols skipped. Fixed prefix
   from strategy.* to risk.*.

2. Holdout sentinel (-999%) caused immediate symbol skip: when holdout window had 0
   trades, evaluator returned -999% sentinel which triggered _is_non_informative_baseline
   → optimizer never tried any mutations. Changed to return 0% with penalty score so
   optimizer can explore looser parameters.

3. partial_data_non_comparable produced -999% sentinel: symbols with incomplete 15m
   coverage (30 days vs 120 requested) hit this path. Changed to return 0-trade
   metrics instead of hard failure.

4. Score thresholds too high for neutral bias: without EMA200 (needs 200 candles,
   only 120 available), all signals get neutral bias → HTF alignment capped at 10,
   EMA slope = 0 → max realistic score ~60. Lowered neutral thresholds:
   tight_smc_neutral 55→35, wide_structure_neutral 50→30.

5. 4H_STRUCTURE_REQUIRED hard gate blocked ~25% of signals: 1H structure fallback
   existed in code but was disabled. Enabled structure_fallback_enabled=true with
   reduced penalty (15→5 points).

Result: research now generates 9+ trades per baseline eval (was 0).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The evaluator passed disable_db_mock=True when DATABASE_URL was set,
causing the replay harness to write simulated trades directly into the
production trades table. This produced 4,517 phantom trades with <60s
duration and -$54K fake PnL.

Fix: always use the DB mock during research replay. Replay reads candle
data from CSV files, not the DB — it never needs real DB writes.

Also cleaned 4,517 phantom trade records from production DB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split symbol optimization across N-1 workers (one per CPU minus OS/collector).
Each worker gets isolated state, logs, and output dirs. Results are merged
after all workers complete so post-run hooks work unchanged.

On 4-CPU droplet: 3 workers × 2 symbols = ~3x faster than sequential.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ArielB1980 ArielB1980 changed the title fix(research): unblock optimizer and harden parameter exploration fix(research): unblock optimizer, fix phantom trades, parallelize Apr 2, 2026
ArielB1980 and others added 7 commits April 2, 2026 16:26
Move StartLimitIntervalSec/StartLimitBurst to [Unit] section (correct
placement per systemd docs). Add one-shot trading_review.py diagnostic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
uv sync --dev uses [dependency-groups] not [project.optional-dependencies].
pytest was only in the latter, causing CI to fail with "Failed to spawn: pytest".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dels shim

These types were referenced by position_manager_v2.py and position_evaluator.py
but never defined in the shim module, causing ImportError in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The parallel worker monitoring loop was dying every ~10 minutes because
`[[ -n "" ]] && VAR="val"` returns exit 1 when the test fails, which
under `set -e` kills the entire daemon. Changed && chains to if/then,
wrapped loop in set +e, and added error tolerance to state merge step.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three changes to fix replay workers using only 30% CPU:

1. Disable OHLCV fetcher rate limiting in replay (semaphore 8→1000,
   min_delay 200ms→0, retries 3→1) — exchange sim is in-memory, no
   real API to throttle against. This was the main bottleneck (95% of
   time spent in epoll_wait).

2. Set LOG_LEVEL=WARNING for research workers — INFO logging was
   producing 280MB/worker of I/O, choking disk and CPU.

3. Support ${VAR:-default} syntax in config.yaml env var expansion.

Result: CPU utilization 16%→71%, per-worker 30%→75%, load 1.3→3.3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes to improve research result quality:

1. Default objective mode net_pnl_only → risk_adjusted. The composite
   score includes drawdown penalty (-0.8), Sharpe/Sortino (0.35), and
   win rate (0.1) — critical for avoiding high-variance curve fits
   that would fail live (19% win rate baseline).

2. Promotion minimum trades 10 → 20. With 120-day windows and 30%
   holdout, 10 trades is too few for statistical confidence.

Also fixed .env on production server:
- RESEARCH_CONT_WINDOW_OFFSETS: 0 → 0,90,180 (3 walk-forward windows)
- RESEARCH_CONT_REPLAY_TIMEFRAMES: added 1m back

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two new post-cycle diagnostics that test whether SMC signal logic
has genuine edge, independent of parameter optimization:

1. Signal Directional Accuracy — intercepts every signal during replay
   and measures if price moves in the predicted direction at 1h/4h/24h.
   Reports hit rate, p-value (binomial test vs 50%), and breakdown by
   setup type (OB/FVG/BOS/TREND) and direction.

2. Random Entry Baseline — runs N replay trials with random entries at
   the same frequency as the real strategy but using the same risk
   management stack. If random entries produce similar returns, the
   signals have no edge.

Both run automatically after each research cycle (alongside the
existing counterfactual twin) with timeout protection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ArielB1980 ArielB1980 merged commit 3b9efd9 into main Apr 5, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant