Skip to content

like u8.unchecked_sub() but w/o nightly experimental API#14

Open
sbechet wants to merge 24 commits intobinaryfields:masterfrom
sbechet:uncheckedsub
Open

like u8.unchecked_sub() but w/o nightly experimental API#14
sbechet wants to merge 24 commits intobinaryfields:masterfrom
sbechet:uncheckedsub

Conversation

@sbechet
Copy link

@sbechet sbechet commented Feb 21, 2023

Solution for #13?

@zeozeozeo
Copy link

@binaryfields can this be merged?

@sbechet
Copy link
Author

sbechet commented May 10, 2024

where did you disappear Sebastian?

@mlund
Copy link

mlund commented Jan 25, 2026

I also ran into this issue and this patch fixed the panic. Ping @binaryfields

// counting down in the release state.
// This has been verified by sampling ENV3.
// NB! The operation below requires two's complement integer.
self.envelope_counter -= 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u8::unchecked_sub() is now stable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Mikael,

I asked Claude, here's "his" "opinion":

"Since u8::unchecked_sub() is now stable in Rust, wrapping_sub could be replaced with it — though I'd argue wrapping_sub is still the better choice here as it explicitly documents the intended wrap-around behavior of the hardware register."

sbechet added 22 commits March 7, 2026 16:15
Replace bare += / -= on envelope_counter and exponential_counter
with wrapping_add / wrapping_sub to correctly emulate the natural
unsigned 8-bit wrap-around of the real SID chip, as the C++ reference
implementation (reSID) relies on implicitly via unsigned char.

This fixes a panic on subtract-with-overflow in debug builds when
the envelope counter crosses zero in the Release state (issue binaryfields#13).

Also fix Default impl to match the state produced by reset():
- exponential_counter_period initialised to 1 (was 0)
- hold_zero initialised to true (was false)
In clock_delta(), reorder the dvlp/dvhp computation to match both
clock() and the C++ reference implementation: derive dv for one cycle
(w0 >> 8 * delta / >> 12 or >> 20), then scale by delta_flt. The
previous formulation multiplied w0_lp by delta_flt before shifting,
yielding different integer rounding than the single-cycle path.

Remove the redundant reset() call in new(): vlp, vhp and vo are
already initialised to 0 in the struct literal.
The MOS8580 has no DC offset on the mixer output (mixer_DC = 0 in
the C++ reference). The previous code defined a single MIXER_DC
constant using the MOS6581 formula and applied it unconditionally,
producing an incorrect DC offset for all MOS8580 instances.

Rename the constant to MIXER_DC_6581 and select the correct value
in new() based on chip_model, matching the set_chip_model() logic
in the reSID reference implementation.
interpolate() used last_index = len - 4, stopping one segment short
of the last curve segment. The C++ reference iterates while p2 != pn
(the final point), corresponding to last_index = len - 3. This caused
the last spline segment to be silently skipped, producing an incorrect
f0 table tail for both chip models.

Change PointPlotter output slice from i32 to i16 to match the actual
type of SPLINE6581_F0 / SPLINE8580_F0 referenced in filter.rs.

Replace the `if y > 0.0` guard in plot() with y.clamp(0.0, i16::MAX)
to match the C++ zero-floor behaviour and prevent silent overflow on
large positive values when casting to i16.
WAVE_ZERO and VOICE_DC hold MOS6581-specific values. Rename them to
WAVE_ZERO_6581 and VOICE_DC_6581 to make their scope explicit, consistent
with the MIXER_DC_6581 naming introduced in filter.rs. No functional change.
In new(), initialise shift directly to 0x007f_fff8 and remove the
redundant reset() call, matching the pattern used in external_filter.

In clock_delta(), add explicit parentheses around (shift << 1) &
SHIFT_MASK for consistency with the identical expression in clock().

In output(), replace the catch-all panic!() with an explicit
0x9..=0xf => 0 arm (matching the C++ reference which returns 0 for
unimplemented waveform combinations) followed by _ => unreachable!().
The waveform field is always masked to 4 bits so the panic could
never be triggered, but was misleading and semantically wrong.
…minants

Change State::hold_zero from [u8; 3] to [bool; 3] to match the bool
type used throughout the rest of the codebase. Remove the u8 encode/
decode conversions in read_state() and write_state() accordingly.

In write_state(), split wave and envelope access into separate blocks
to avoid a double mutable borrow of voices[i] that the original code
silently relied on the compiler accepting.

Introduce named constants ENV_STATE_ATTACK / DECAY_SUSTAIN / RELEASE
for the u8 serialisation of EnvState. The previous code used `as u8`
in read_state() and magic literals in write_state(), meaning a reorder
of the enum variants would silently corrupt saved state.
…WORDS

The final f64 was assembled as:
    f64::from_bits((ix0 as u64) << 32 | ix1 as u64)

ix0 is i32. In Rust, `i32 as u64` sign-extends: a negative ix0 would
set the upper 32 bits to 0xFFFFFFFF, corrupting the exponent and sign
bit of the result. The C reference uses INSERT_WORDS which copies the
raw 32-bit pattern with no sign interpretation.

Fix: cast via u32 first (`ix0 as u32 as u64`) to reinterpret bits
without sign extension before widening to u64.

The bug is dormant on typical SID audio values where ix0 stays
positive, but is a latent correctness issue for edge-case inputs.
…utput ordering

clock_fast: when the output buffer is full (index >= buffer.len()),
return (index, delta) immediately so the caller can resume with the
remaining cycles, matching the C++ `if (s >= n) { return s; }` path.
The previous break + tail-consume silently discarded remaining delta.

clock_interpolate (main loop): replace the incorrect output()-then-
clock() per-iteration pattern with the reference sequence:
  - (delta_sample - 1) bare clock() calls
  - one output() capture into prev_sample
  - one final clock()
This matches `for (i=0; i<delta_t_sample-1; i++) clock(); if (i<...)
{ sample_prev=output(); clock(); }` in the C++ reference.

clock_interpolate (tail path): add the missing final
`prev_sample = output(); clock()` after the delta-1 tail clocks,
matching the C++ tail that prepares prev_sample for the next call.
init_fir() computes a Kaiser-windowed sinc FIR table — an expensive
operation involving nested loops with I0() and sin() evaluations that
can take several milliseconds. It was called unconditionally on every
set_parameters() invocation even when clock_freq and sample_freq had
not changed (e.g. chip reset, filter toggle, emulator pause/resume).

Add clock_freq and sample_freq fields to Fir. In set_parameters(),
skip init_fir() when both match the cached values and the table is
already populated (fir.n != 0). The cache is invalidated whenever
either frequency changes, ensuring correctness.
syncable_voice(i).output() was called three times inline as arguments
to filter.clock() and filter.clock_delta(), each triggering a rotate3()
call and a full output computation. Cache the three values in locals
v0/v1/v2 before the filter call. Introduce voice_output(i) which builds
the Syncable directly from SYNC_SOURCE/SYNC_DEST indices, bypassing
rotate3() while preserving all voice3off and ring-mod logic.

synth: replace rotate3() in clock_delta() sync loop with index constants

The MSB-toggle loop in clock_delta() called syncable_voice(i) on each
iteration to read sync_dest.get_sync() and main.get_frequency(), paying
a rotate3() cost per voice per sub-step. This loop runs O(delta/freq)
times — potentially thousands of iterations per call at high oscillator
frequencies.

Replace with direct voices[SYNC_SOURCE[i]] / voices[SYNC_DEST[i]]
accesses using the compile-time SYNC_SOURCE and SYNC_DEST index tables,
which encode the same wiring as the C++ constructor without any runtime branching.
The previous implementation used a branch on slice lengths to decide
which iterator to put first in zip(), but zip() still couldn't prove
equal lengths to the compiler, leaving per-element bounds checks in
the generated code and blocking auto-vectorisation.

Truncate both slices to len = sample.len().min(fir.len()) before
zipping. The compiler can now statically prove both slices have the
same size, eliminate the bounds checks, and auto-vectorise — without
any unsafe. In practice the two slices are always the same length
(constructed that way in clock_resample_*), so min() is a no-op at
runtime.
`delta_sample - 1` and `delta - 1` on u32 values panic in debug builds
(and wrap silently in release) when the value is 0.

delta_sample can be 0 if cycles_per_sample is 0, which is the case
between new() and the first set_parameters() call.

Replace both with saturating_sub(1). Also remove the redundant
`if delta > 0` guard inside the tail path — delta >= 1 is already
established by the outer `if delta > 0` check.
clock_resample_interpolate and clock_resample_fast shared three
identical blocks — the ring buffer fill loop, the tail path, and the
16-bit saturation clamp — duplicated verbatim.

Extract them into:
  fill_ring_buffer(n)       — clock n cycles into the mirrored ring buffer
  drain_ring_buffer(...)    — shared tail path after the main loop
  saturate(v) -> i16        — clamp i32 convolution result to i16 range

Both clock_resample_* functions are now ~20 lines each, containing
only the logic that actually differs between them (linear interpolation
vs nearest-sample convolution).
The four offset helpers were identical except for a half-cycle bias
(1 << (FIXP_SHIFT - 1)) applied in Fast/Interpolate modes to round
delta to the nearest sample. Merge them into next_sample_offset(nearest)
and commit_sample_offset(next, nearest), with the bias factored through
a single `if nearest` expression. Call sites pass `true` (Fast,
Interpolate) or `false` (Resample, ResampleFast) to make the rounding
intent explicit at each use.

sampler: convert i0() and sqrt() to free functions

Both were &self methods that made no use of self — a misleading
signature that implied access to Sampler state. Promote them to
module-level free functions. Call sites in init_fir() are updated
accordingly. The cfg(feature = "std") / cfg(not(...)) split on sqrt
is preserved unchanged.
dvlp and dvhp depend on vlp and vhp which are updated at every
sub-step. The previous fix correctly reordered operations but
accidentally hoisted the dv calculations outside the loop, computing
them once from the initial state and multiplying by delta_flt. This
diverges from the C++ reference which recalculates both inside each
iteration. Move delta_flt into the dv expressions and remove the
post-multiply.
…+p2, ++p3)`.

    // pn points past the last element, so the loop stops when p2 == pn, i.e.
    // when i+2 == len-1  =>  i == len-3.  The last valid i is therefore len-4
    // and p3 lands on the last element (index len-1).  Using len-3 would
    // advance p3 one past the end → out-of-bounds.
@sbechet
Copy link
Author

sbechet commented Mar 7, 2026

Hello Sebastian,

I understand you're very busy but this piece of code deserved a little cleaning up..
If Mikael confirms these changes, could you push them to crates.io? I've updated the Cargo.toml semver.

Thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants