[WIP] Fix lane reward coefficients: wrong magnitude and sign by eugenevinitsky · Pull Request #385 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-04-01T12:30:02Z

Summary

reward_lane_align: 1.0 → 0.025
reward_lane_center: 1.0 → -0.00075

Problem

PR #296 introduced lane rewards with default coefficients of 1.0, but the randomization bounds suggest values ~1000x smaller. With coeff=1.0:

Lane align penalty reached -2.1 per step (nearly as large as a one-time collision penalty of -3.5)
Lane center had positive coefficient, rewarding being off-center instead of penalizing it
These massive per-step rewards dominated all other signals and destroyed explained variance

reward_lane_align and reward_lane_center were set to 1.0, but the randomization bounds are 0.00025-0.0025 (lane_align) and -0.00075 to -0.00025 (lane_center, negative = penalty). With coeff=1.0: - Lane align penalty could reach -2.1 per step (nearly a collision penalty) - Lane center reward was +0.2 per step (POSITIVE = rewarding being off-center) - These massive, wrong-sign per-step rewards dominate all other signals and destroy explained variance Fixed to: - reward_lane_align: 1.0 → 0.025 (midpoint of bounds) - reward_lane_center: 1.0 → -0.00075 (negative to penalize off-center) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Updates the default lane-related reward coefficients in the Ocean Drive environment config to avoid lane rewards dominating per-step training signals and to correct the lane-center reward sign.

Changes:

Reduce reward_lane_align coefficient from 1 to 0.025.
Change reward_lane_center coefficient from 1 to -0.00075.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T12:34:02Z

pufferlib/config/ocean/drive.ini

 reward_offroad_collision = -0.5     # Use -0.05 for carla maps
-reward_lane_align = 1
-reward_lane_center = 1
+reward_lane_align = 0.025


reward_lane_align is set to 0.025, but later in this same INI reward_bound_lane_align_max is 0.0025. This makes the fixed coefficient outside the configured randomization/normalization bounds (and contradicts the PR description about midpoints), which can saturate reward-conditioning normalization and makes it unclear which magnitude is intended. Please reconcile by either lowering reward_lane_align into the bound range (e.g., near the bound midpoint) or updating the reward_bound_lane_align_* values to include 0.025 if that’s the intended scale.

Suggested change

reward_lane_align = 0.025

reward_lane_align = 0.00125

Copilot AI review requested due to automatic review settings April 1, 2026 12:30

Copilot started reviewing on behalf of eugenevinitsky April 1, 2026 12:30 View session

eugenevinitsky changed the title ~~Fix lane reward coefficients: wrong magnitude and sign~~ [WIP] Fix lane reward coefficients: wrong magnitude and sign Apr 1, 2026

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix lane reward coefficients: wrong magnitude and sign#385

[WIP] Fix lane reward coefficients: wrong magnitude and sign#385
eugenevinitsky wants to merge 1 commit into3.0from
fix/lane-reward-coefficients

eugenevinitsky commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eugenevinitsky commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eugenevinitsky commented Apr 1, 2026 •

edited

Loading