[WIP] Fix lane reward coefficients: wrong magnitude and sign#385
[WIP] Fix lane reward coefficients: wrong magnitude and sign#385eugenevinitsky wants to merge 1 commit into3.0from
Conversation
reward_lane_align and reward_lane_center were set to 1.0, but the randomization bounds are 0.00025-0.0025 (lane_align) and -0.00075 to -0.00025 (lane_center, negative = penalty). With coeff=1.0: - Lane align penalty could reach -2.1 per step (nearly a collision penalty) - Lane center reward was +0.2 per step (POSITIVE = rewarding being off-center) - These massive, wrong-sign per-step rewards dominate all other signals and destroy explained variance Fixed to: - reward_lane_align: 1.0 → 0.025 (midpoint of bounds) - reward_lane_center: 1.0 → -0.00075 (negative to penalize off-center) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates the default lane-related reward coefficients in the Ocean Drive environment config to avoid lane rewards dominating per-step training signals and to correct the lane-center reward sign.
Changes:
- Reduce
reward_lane_aligncoefficient from1to0.025. - Change
reward_lane_centercoefficient from1to-0.00075.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| reward_offroad_collision = -0.5 # Use -0.05 for carla maps | ||
| reward_lane_align = 1 | ||
| reward_lane_center = 1 | ||
| reward_lane_align = 0.025 |
There was a problem hiding this comment.
reward_lane_align is set to 0.025, but later in this same INI reward_bound_lane_align_max is 0.0025. This makes the fixed coefficient outside the configured randomization/normalization bounds (and contradicts the PR description about midpoints), which can saturate reward-conditioning normalization and makes it unclear which magnitude is intended. Please reconcile by either lowering reward_lane_align into the bound range (e.g., near the bound midpoint) or updating the reward_bound_lane_align_* values to include 0.025 if that’s the intended scale.
| reward_lane_align = 0.025 | |
| reward_lane_align = 0.00125 |
Summary
reward_lane_align: 1.0 → 0.025reward_lane_center: 1.0 → -0.00075Problem
PR #296 introduced lane rewards with default coefficients of 1.0, but the randomization bounds suggest values ~1000x smaller. With coeff=1.0: