Skip to content

[WIP] Fix lane reward coefficients: wrong magnitude and sign#385

Open
eugenevinitsky wants to merge 1 commit into3.0from
fix/lane-reward-coefficients
Open

[WIP] Fix lane reward coefficients: wrong magnitude and sign#385
eugenevinitsky wants to merge 1 commit into3.0from
fix/lane-reward-coefficients

Conversation

@eugenevinitsky
Copy link
Copy Markdown

@eugenevinitsky eugenevinitsky commented Apr 1, 2026

Summary

  • reward_lane_align: 1.0 → 0.025
  • reward_lane_center: 1.0 → -0.00075

Problem

PR #296 introduced lane rewards with default coefficients of 1.0, but the randomization bounds suggest values ~1000x smaller. With coeff=1.0:

  • Lane align penalty reached -2.1 per step (nearly as large as a one-time collision penalty of -3.5)
  • Lane center had positive coefficient, rewarding being off-center instead of penalizing it
  • These massive per-step rewards dominated all other signals and destroyed explained variance

reward_lane_align and reward_lane_center were set to 1.0, but the
randomization bounds are 0.00025-0.0025 (lane_align) and
-0.00075 to -0.00025 (lane_center, negative = penalty). With coeff=1.0:
- Lane align penalty could reach -2.1 per step (nearly a collision penalty)
- Lane center reward was +0.2 per step (POSITIVE = rewarding being off-center)
- These massive, wrong-sign per-step rewards dominate all other signals
  and destroy explained variance

Fixed to:
- reward_lane_align: 1.0 → 0.025 (midpoint of bounds)
- reward_lane_center: 1.0 → -0.00075 (negative to penalize off-center)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 1, 2026 12:30
@eugenevinitsky eugenevinitsky changed the title Fix lane reward coefficients: wrong magnitude and sign [WIP] Fix lane reward coefficients: wrong magnitude and sign Apr 1, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the default lane-related reward coefficients in the Ocean Drive environment config to avoid lane rewards dominating per-step training signals and to correct the lane-center reward sign.

Changes:

  • Reduce reward_lane_align coefficient from 1 to 0.025.
  • Change reward_lane_center coefficient from 1 to -0.00075.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

reward_offroad_collision = -0.5 # Use -0.05 for carla maps
reward_lane_align = 1
reward_lane_center = 1
reward_lane_align = 0.025
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reward_lane_align is set to 0.025, but later in this same INI reward_bound_lane_align_max is 0.0025. This makes the fixed coefficient outside the configured randomization/normalization bounds (and contradicts the PR description about midpoints), which can saturate reward-conditioning normalization and makes it unclear which magnitude is intended. Please reconcile by either lowering reward_lane_align into the bound range (e.g., near the bound midpoint) or updating the reward_bound_lane_align_* values to include 0.025 if that’s the intended scale.

Suggested change
reward_lane_align = 0.025
reward_lane_align = 0.00125

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants