like u8.unchecked_sub() but w/o nightly experimental API#14
Open
sbechet wants to merge 24 commits intobinaryfields:masterfrom
Open
like u8.unchecked_sub() but w/o nightly experimental API#14sbechet wants to merge 24 commits intobinaryfields:masterfrom
sbechet wants to merge 24 commits intobinaryfields:masterfrom
Conversation
|
@binaryfields can this be merged? |
Author
|
where did you disappear Sebastian? |
|
I also ran into this issue and this patch fixed the panic. Ping @binaryfields |
mlund
reviewed
Jan 26, 2026
| // counting down in the release state. | ||
| // This has been verified by sampling ENV3. | ||
| // NB! The operation below requires two's complement integer. | ||
| self.envelope_counter -= 1; |
Author
There was a problem hiding this comment.
Hello Mikael,
I asked Claude, here's "his" "opinion":
"Since u8::unchecked_sub() is now stable in Rust, wrapping_sub could be replaced with it — though I'd argue wrapping_sub is still the better choice here as it explicitly documents the intended wrap-around behavior of the hardware register."
Replace bare += / -= on envelope_counter and exponential_counter with wrapping_add / wrapping_sub to correctly emulate the natural unsigned 8-bit wrap-around of the real SID chip, as the C++ reference implementation (reSID) relies on implicitly via unsigned char. This fixes a panic on subtract-with-overflow in debug builds when the envelope counter crosses zero in the Release state (issue binaryfields#13). Also fix Default impl to match the state produced by reset(): - exponential_counter_period initialised to 1 (was 0) - hold_zero initialised to true (was false)
In clock_delta(), reorder the dvlp/dvhp computation to match both clock() and the C++ reference implementation: derive dv for one cycle (w0 >> 8 * delta / >> 12 or >> 20), then scale by delta_flt. The previous formulation multiplied w0_lp by delta_flt before shifting, yielding different integer rounding than the single-cycle path. Remove the redundant reset() call in new(): vlp, vhp and vo are already initialised to 0 in the struct literal.
The MOS8580 has no DC offset on the mixer output (mixer_DC = 0 in the C++ reference). The previous code defined a single MIXER_DC constant using the MOS6581 formula and applied it unconditionally, producing an incorrect DC offset for all MOS8580 instances. Rename the constant to MIXER_DC_6581 and select the correct value in new() based on chip_model, matching the set_chip_model() logic in the reSID reference implementation.
interpolate() used last_index = len - 4, stopping one segment short of the last curve segment. The C++ reference iterates while p2 != pn (the final point), corresponding to last_index = len - 3. This caused the last spline segment to be silently skipped, producing an incorrect f0 table tail for both chip models. Change PointPlotter output slice from i32 to i16 to match the actual type of SPLINE6581_F0 / SPLINE8580_F0 referenced in filter.rs. Replace the `if y > 0.0` guard in plot() with y.clamp(0.0, i16::MAX) to match the C++ zero-floor behaviour and prevent silent overflow on large positive values when casting to i16.
WAVE_ZERO and VOICE_DC hold MOS6581-specific values. Rename them to WAVE_ZERO_6581 and VOICE_DC_6581 to make their scope explicit, consistent with the MIXER_DC_6581 naming introduced in filter.rs. No functional change.
In new(), initialise shift directly to 0x007f_fff8 and remove the redundant reset() call, matching the pattern used in external_filter. In clock_delta(), add explicit parentheses around (shift << 1) & SHIFT_MASK for consistency with the identical expression in clock(). In output(), replace the catch-all panic!() with an explicit 0x9..=0xf => 0 arm (matching the C++ reference which returns 0 for unimplemented waveform combinations) followed by _ => unreachable!(). The waveform field is always masked to 4 bits so the panic could never be triggered, but was misleading and semantically wrong.
…minants Change State::hold_zero from [u8; 3] to [bool; 3] to match the bool type used throughout the rest of the codebase. Remove the u8 encode/ decode conversions in read_state() and write_state() accordingly. In write_state(), split wave and envelope access into separate blocks to avoid a double mutable borrow of voices[i] that the original code silently relied on the compiler accepting. Introduce named constants ENV_STATE_ATTACK / DECAY_SUSTAIN / RELEASE for the u8 serialisation of EnvState. The previous code used `as u8` in read_state() and magic literals in write_state(), meaning a reorder of the enum variants would silently corrupt saved state.
…WORDS
The final f64 was assembled as:
f64::from_bits((ix0 as u64) << 32 | ix1 as u64)
ix0 is i32. In Rust, `i32 as u64` sign-extends: a negative ix0 would
set the upper 32 bits to 0xFFFFFFFF, corrupting the exponent and sign
bit of the result. The C reference uses INSERT_WORDS which copies the
raw 32-bit pattern with no sign interpretation.
Fix: cast via u32 first (`ix0 as u32 as u64`) to reinterpret bits
without sign extension before widening to u64.
The bug is dormant on typical SID audio values where ix0 stays
positive, but is a latent correctness issue for edge-case inputs.
…utput ordering
clock_fast: when the output buffer is full (index >= buffer.len()),
return (index, delta) immediately so the caller can resume with the
remaining cycles, matching the C++ `if (s >= n) { return s; }` path.
The previous break + tail-consume silently discarded remaining delta.
clock_interpolate (main loop): replace the incorrect output()-then-
clock() per-iteration pattern with the reference sequence:
- (delta_sample - 1) bare clock() calls
- one output() capture into prev_sample
- one final clock()
This matches `for (i=0; i<delta_t_sample-1; i++) clock(); if (i<...)
{ sample_prev=output(); clock(); }` in the C++ reference.
clock_interpolate (tail path): add the missing final
`prev_sample = output(); clock()` after the delta-1 tail clocks,
matching the C++ tail that prepares prev_sample for the next call.
init_fir() computes a Kaiser-windowed sinc FIR table — an expensive operation involving nested loops with I0() and sin() evaluations that can take several milliseconds. It was called unconditionally on every set_parameters() invocation even when clock_freq and sample_freq had not changed (e.g. chip reset, filter toggle, emulator pause/resume). Add clock_freq and sample_freq fields to Fir. In set_parameters(), skip init_fir() when both match the cached values and the table is already populated (fir.n != 0). The cache is invalidated whenever either frequency changes, ensuring correctness.
syncable_voice(i).output() was called three times inline as arguments to filter.clock() and filter.clock_delta(), each triggering a rotate3() call and a full output computation. Cache the three values in locals v0/v1/v2 before the filter call. Introduce voice_output(i) which builds the Syncable directly from SYNC_SOURCE/SYNC_DEST indices, bypassing rotate3() while preserving all voice3off and ring-mod logic. synth: replace rotate3() in clock_delta() sync loop with index constants The MSB-toggle loop in clock_delta() called syncable_voice(i) on each iteration to read sync_dest.get_sync() and main.get_frequency(), paying a rotate3() cost per voice per sub-step. This loop runs O(delta/freq) times — potentially thousands of iterations per call at high oscillator frequencies. Replace with direct voices[SYNC_SOURCE[i]] / voices[SYNC_DEST[i]] accesses using the compile-time SYNC_SOURCE and SYNC_DEST index tables, which encode the same wiring as the C++ constructor without any runtime branching.
The previous implementation used a branch on slice lengths to decide which iterator to put first in zip(), but zip() still couldn't prove equal lengths to the compiler, leaving per-element bounds checks in the generated code and blocking auto-vectorisation. Truncate both slices to len = sample.len().min(fir.len()) before zipping. The compiler can now statically prove both slices have the same size, eliminate the bounds checks, and auto-vectorise — without any unsafe. In practice the two slices are always the same length (constructed that way in clock_resample_*), so min() is a no-op at runtime.
`delta_sample - 1` and `delta - 1` on u32 values panic in debug builds (and wrap silently in release) when the value is 0. delta_sample can be 0 if cycles_per_sample is 0, which is the case between new() and the first set_parameters() call. Replace both with saturating_sub(1). Also remove the redundant `if delta > 0` guard inside the tail path — delta >= 1 is already established by the outer `if delta > 0` check.
clock_resample_interpolate and clock_resample_fast shared three identical blocks — the ring buffer fill loop, the tail path, and the 16-bit saturation clamp — duplicated verbatim. Extract them into: fill_ring_buffer(n) — clock n cycles into the mirrored ring buffer drain_ring_buffer(...) — shared tail path after the main loop saturate(v) -> i16 — clamp i32 convolution result to i16 range Both clock_resample_* functions are now ~20 lines each, containing only the logic that actually differs between them (linear interpolation vs nearest-sample convolution).
The four offset helpers were identical except for a half-cycle bias (1 << (FIXP_SHIFT - 1)) applied in Fast/Interpolate modes to round delta to the nearest sample. Merge them into next_sample_offset(nearest) and commit_sample_offset(next, nearest), with the bias factored through a single `if nearest` expression. Call sites pass `true` (Fast, Interpolate) or `false` (Resample, ResampleFast) to make the rounding intent explicit at each use. sampler: convert i0() and sqrt() to free functions Both were &self methods that made no use of self — a misleading signature that implied access to Sampler state. Promote them to module-level free functions. Call sites in init_fir() are updated accordingly. The cfg(feature = "std") / cfg(not(...)) split on sqrt is preserved unchanged.
dvlp and dvhp depend on vlp and vhp which are updated at every sub-step. The previous fix correctly reordered operations but accidentally hoisted the dv calculations outside the loop, computing them once from the initial state and multiplying by delta_flt. This diverges from the C++ reference which recalculates both inside each iteration. Move delta_flt into the dv expressions and remove the post-multiply.
…+p2, ++p3)`.
// pn points past the last element, so the loop stops when p2 == pn, i.e.
// when i+2 == len-1 => i == len-3. The last valid i is therefore len-4
// and p3 lands on the last element (index len-1). Using len-3 would
// advance p3 one past the end → out-of-bounds.
Author
|
Hello Sebastian, I understand you're very busy but this piece of code deserved a little cleaning up.. Thx. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Solution for #13?