ralith / oddio Goto Github PK

View Code? Open in Web Editor NEW

146.0 4.0 9.0 319 KB

Lightweight game audio

License: Apache License 2.0

Rust 100.00%

hacktoberfest rust audio game-development

oddio's Introduction

Oddio

Oddio is a game-oriented audio library that is:

Lightweight: Fast compilation, few dependencies, and a simple interface
Sans I/O: Send output wherever you like
Real-time: Audio output is efficient and wait-free: no glitches until you run out of CPU
3D: Spatialization with doppler effects and propagation delay available out of the box
Extensible: Implement Signal for custom streaming synthesis and filtering
Composable: Signals can be transformed without obstructing the inner Signal's controls

Example

let (mut scene_handle, mut scene) = oddio::SpatialScene::new();

// In audio callback:
let out_frames = oddio::frame_stereo(data);
oddio::run(&mut scene, output_sample_rate, out_frames);

// In game logic:
let frames = oddio::FramesSignal::from(oddio::Frames::from_slice(sample_rate, &frames));
let mut handle = scene_handle
    .play(frames, oddio::SpatialOptions { position, velocity, ..Default::default() });

// When position/velocity changes:
handle.set_motion(position, velocity, false);

License

Licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

oddio's People

Contributors

Stargazers

Watchers

Forkers

sanbox-irl joseluis mooman219 kcking davidster harudagondi exjam zarik5 icodein

oddio's Issues

DiscreteSignal

A separate trait for discrete sources, convertible to continuous sources by combining with an interpolator (see also #12), could reduce code duplication by allowing interpolator reuse, and simplify certain source transformations such as looping.

Support position discontinuities in spatialization

Spatialization currently assumes that all source and listener movement is continuous, i.e. no teleporting. As a result, discontinuous motion is interpreted as extremely fast motion, leading to highly aliased sampling in the propagation delay/doppler effect model. This should be fixed by allowing source controls to declare a discontinuity, causing sampling to restart based on the new parameters rather than smoothly slewing to them.

Gap-free nonlinear sequencing

@tesselode has described some interesting use cases where sounds are played according to complex rules with precise timing based on dynamic gameplay conditions. This is difficult to support: simply playing new sounds immediately in response to gameplay events will produce relative timing jitter in their playback; while this is fine for common gameplay sounds like gunshots, if the samples are intended to be sections of contiguous music then sample-perfect relative timing may be required.

They've solved the problem by allowing the audio thread's current time to be polled from non-realtime gameplay logic, and by allowing the playback of new sounds to be scheduled for precise times in the future. By working slightly in advance, this allows sounds to be timed precisely with respect to eachother based on near-realtime gameplay state.

This alone would be a reasonable primitive for us to provide. However, I suspect the ergonomics can be improved further with clever use of async/await: a specialized async runtime could be provided which masks the latency offset necessary to schedule work ahead of time entirely by maintaining a virtual "now" that, when the runtime is polled, iteratively completes time-based audio futures in order until it reaches the current audio time plus the fixed latency offset, polling all outstanding tasks after each completion. When a new playback command is issued from within a task, it's always relative to the virtual "now". This would allow code like handle.play(foo).await; handle.play(bar).await; to Just Work gap-free, with arbitrarily complex control flow and inspection of game logic folded in.

Pause/stop should ramp

These probably(?) cause popping as-is.

Distance-dependent filtering

Spatial currently reduces amplitude per the inverse square law. This could be extended to account for greater absorption of higher-frequency sound in air, per e.g. https://computingandrecording.wordpress.com/2017/07/05/approximating-atmospheric-absorption-with-a-simple-filter/.

Mark oddio as no_std

Add Seek for Gain

Currently Gain is non-seekable which requires using the SpatialSceneControl::play_buffered instead of SpatialSceneControl::play, which is both less ergonomic and slower (even with optimizations).

OTOH, per-signal gain seems to be standard in many other libraries, e.g. kira, soloud, fyrox-sound.

We can either make Gain seekable or add set_gain (and related methods) to SpatialControl. If we choose to impl Seek for Gain there is a question of whether we should rewind the gain when we seek back in time.

Safer primitives for controlled signals

Right now, defining a controlled signal always involves unsafe shared memory handling. We should explore ways to leverage the type system to reduce the hazards here, ideally without forcing additional layers of indirection. Maybe a pattern for safe inline shared-memory primitives could be developed based on newtyped references with private constructors and projection?

Occlusion

This could be as simple as a filter combinator that allows dynamically bypassing another filter, to easily toggle a low-pass filter on or off for a source, and/or a dynamically configurable lowpass filter.

Consider pluggable interpolation

Resampling in stream::Receiver and SamplesSource is currently hardcoded to use linear interpolation. Higher quality might be obtained by using higher-order polynomial interpolation, at a latency and CPU time cost. It's unclear if the quality difference would be meaningful.

Short audio sounds different than it should (not sure if this is an oddio issue)

Hey!

We have this issue on notan where short audio seems to sound different depending on where you reproduce it. I am not sure at all that this is an issue on notan, Oddio, o maybe Symphonia, but I have been stuck with this for a while, and I was thinking that perhaps it is a good idea to ask here to see if this ring any bell and somebody can give me a hint or point me in the right direction.

Thanks!

Issue: Nazariglez/notan#206

Add Licenses

Realized that the license aren't included in the repository. I've submitted some, lol, comments and minor code, so just to be very legal, i give my permission for those contributions to be MIT and Apache 2.0

SPSC memory is allocated twice

This is probably not a big deal, I just wanted to mention it because it surprised me when I first realized it ...

oddio/src/spsc.rs

Lines 197 to 202 in adc60db

    
           let mem = alloc::alloc(layout); 
        
           mem.cast::<Header>().write(Header { 
        
               read: AtomicUsize::new(0), 
        
               write: AtomicUsize::new(0), 
        
           }); 
        
           Box::from_raw(ptr::slice_from_raw_parts_mut(mem, capacity) as *mut Self).into()

After allocating all memory, the Box is converted to an Arc which allocates completely new memory and copies the initialized values to it. AFAICT, the originally allocated memory is deallocated again at the end of the new() method.

Is this intentional?
Or am I missing something?

I found this when trying to convert my own SPSC ring buffer to a DST (see mgeier/rtrb#75), where this crate's code was very helpful. I avoided the repeated allocation by manually implementing a small subset of Arc for this special case instead of using Arc directly.

Stop<T> can cause popping

oddio/src/mixer.rs

Lines 92 to 98 in adc60db

    
           if signal.is_stopped() { 
        
               this.set.remove(i); 
        
               continue; 
        
           } 
        
           if signal.is_paused() { 
        
               continue; 
        
           }

Abruptly halting the signal can create pops here. This state transition will need to be smoothed to some degree.

Built in method to tell remaining time on a Signal

I'm designing some UI widgets which should provide a track to show currently playing audio. It needs to know how much time is remaining within the track to show the user how far into the track we currently are.

Right now, it isn't possible to access that behavior on FramesSignal directly.

I spoke to Ralith in the Gamedev discord group, and he posted the following:

rolling your own signal is the proper way to get arbitrary feedback in general; I'm not necessarily opposed to baking this particular feature into FramesSignal though

I'd be in support of baking that feature in. I wouldn't mind providing that PR but I'm reticent to commit any serious code to a codebase I have only a trivial understanding of. If you posted some guidance, I'd give it a go, but I'd also understand that for something like this, doing it yourself might be preferable. Either way, thanks for all the hard work

Sampling controlled sources with negative offsets is prone to discontinuities

For example, the current Speed control will skip back in time if the speed is increased, producing a pop. Even simpler controls like Gain cannot smoothly blend between states because they cannot reliably judge what the state was when previously sampled for a given stream (i.e. run of sample calls whose output will be played contiguously), since e.g. Spatial runs multiple sampling streams concurrently. Existing code assumes that transitions will occur immediately before sampling at offset 0, but e.g. propagation delay in Spatial undermines that.

Perfectly consistent behavior would require keeping a log of all control changes and looking up the effective control for each time period when sampling that period, but that is unacceptable due to unbounded memory requirements. However, we don't actually need perfect consistency, just a guarantee that no pops will occur and that the latest control states are used for current and future output. This could be accomplished by expanding the Signal interface to allow prior states to be tracked per stream. For example:

trait Signal {
    type Sampler: Sampler<Self>;
    fn sampler(&self) -> Self::Sampler;
    // ...
}
trait Sampler<T: Signal> {
    fn sample(&mut self, signal: &T, offset: f32, interval: f32, out: &mut [T::Frame]);
}

Tone mapping

Sources are currently mixed by simple summation, which will lead to clipping in some scenes. Dynamic ranging similar to that used in HDR graphics could prevent this, and enable a much higher dynamic range in a single scene, e.g. allowing whispers to be heard so long as there aren't gunshots competing for your attention.

SpatialScene walk_set not stopping Streams correctly

I am using a Handle<SpatialBuffered<Stop<Gain<Stream<f32>>>>>.

I am streaming audio from a file, and when I reach the end of the file I drop the handle. This means I am relying on the handle_dropped check to signal to the Stream that it is closed, and to allow remaining buffered audio in Stream to drain.

However I notice that the signal is never actually stopped and removed from the set because of the check including distance - obviously it is very very unlikely that distance will ever reach 0 for most spatial sounds. So none of my spatial sounds are ever stopped and end up leaking memory - and once there are enough sounds it affects playback peformance too.

oddio/src/spatial.rs

Lines 250 to 254 in 60d4c87

    
           let distance = norm(prev_position.into()); 
        
           let remaining = stop.remaining() + distance / SPEED_OF_SOUND; 
        
           if remaining <= 0.0 { 
        
               stop.stop(); 
        
           }

Smoothing over variable time.

Smoothed currently smooths over a fixed time frame.

oddio/src/gain.rs

Lines 37 to 48 in 1d6ff52

    
           fn sample(&self, interval: f32, out: &mut [T::Frame]) { 
        
               self.inner.sample(interval, out); 
        
               let shared = f32::from_bits(self.shared.load(Ordering::Relaxed)); 
        
               let mut gain = self.gain.borrow_mut(); 
        
               if gain.get() != shared { 
        
                   gain.set(shared); 
        
               } 
        
               for x in out { 
        
                   *x = frame::scale(x, gain.get()); 
        
                   gain.advance(interval / SMOOTHING_PERIOD); 
        
               } 
        
           }

oddio/src/gain.rs

Line 109 in 1d6ff52

const SMOOTHING_PERIOD: f32 = 0.1;

I would like to smooth over a variable time frame such that controls like Gain can be reused as say a fade. I believe this would make the API more robust.

My naïve approach would be to control the gain and smooth interval independently i.e.

gain.set_smooth_interval(5.0)
gain.set_gain(1.0)

If a smooth is in progress when set_smooth_interval is called, I think smoothing the remaining distance over the full interval is reasonable.

How to use Cycle to loop a sound?

Sorry to add this as an issue, I did not find another way to ask. I'm adding audio support for notan and I am doing some research and tests with oddio.

I'll implement audio using oddio+symphonia, this combo does all that I need. I was just wondering how to use Cycle to repeat audio I tried to use it as a part of the AudioHandle type but without success, I am not sure how can I chain this type and how to control it later.

Any hint or example will be awesome. Thanks!

Composable controls

If a source is wrapped in multiple filters with dynamic parameters (e.g. for #9), it can be difficult to access more than the outermost. We should provide some sort of helper to expose them equally.

Configure Gain smoothing speed

Right now, Gain smoothing speed is set to 0.1 as a const. It would be nice to configure that as a user.

Fold spatial/mixer handles into the generic control system

These are currently a needless divergence from the standard conventions. No major challenges expected, just some mildly tedious refactoring to use explicit shared memory.

DAG topology

oddio presently only has good support for signals and filters arranged in trees. It's unclear if there's a compelling use case for more complex graph topology; if you have one, please comment!

The primary difficulty in resolving this is that Signal::sample implicitly advances time, so if a single signal must be sent to multiple places, it must be buffered. Additionally, because multiple consumers might sample over differing amounts of time, the buffering logic is non-trivial.

One approach would be to define a special Fork filter that allows any number of handles to be produced (via an internal Arc) which tracks the relative positions of each handle, and buffers at least enough samples to cover the difference between the earliest and latest handle. To avoid dynamic allocation on the audio thread, a maximum buffer size would be needed up front.

Consider special-casing seekable signals in SpatialScene

SpatialScene buffers audio internally for each signal to implement propagation delay and the doppler effect. This guarantees good behavior for dynamic sources, but e.g. FramesSource can trivially yield data for any point in time and could therefore avoid the buffering, guaranteeing good behavior for any amount of propagation delay and marginally reducing overhead.

Migration from 0.5 to 0.6 Gain can't set initial volume

Hello!

I am migrating from 0.5 to 0.6 and I saw that Gain doesn't allow anymore to set the initial volume, to do that I should use FixedGain but then it seems that GainControl cannot be used in the same way. Is there a way to set the volume before it starts playing and allows to change it after by the user?

This is the code I want to migrate: https://github.com/Nazariglez/notan/blob/f/prepare0.7.0/crates/notan_oddio/src/backend.rs#L290

And this is an example of how it works now using 0.5.0. You can set the volume before clicking pause and it will start at that level.

I would appreciate any hint. Thanks

FixedGain that implements Seek

A FixedGain or (StaticGain?) signal is useful for cases where one wants to set how loud something is when it starts playing, but never adjust the loudness while it's playing. This allows implementing Seek which avoids needing a buffer for spatialization.

Here's a version I've implemented outside oddio:

use oddio::Frame;

pub struct FixedGain<T> {
    gain: f32,
    inner: T,
}

impl<T> FixedGain<T> {
    pub fn new(signal: T, gain: f32) -> Self {
        Self {
            gain,
            inner: signal,
        }
    }
}

impl<T: oddio::Signal> oddio::Signal for FixedGain<T>
where
    T::Frame: oddio::Frame,
{
    type Frame = T::Frame;

    fn sample(&self, interval: f32, out: &mut [Self::Frame]) {
        self.inner.sample(interval, out);
        for frame in out {
            for v in frame.channels_mut() {
                *v *= self.gain
            }
        }
    }
}

impl<T> oddio::Seek for FixedGain<T>
where
    T: oddio::Signal + oddio::Seek,
    T::Frame: Frame,
{
    fn seek(&self, seconds: f32) {
        self.inner.seek(seconds);
    }
}

HRTF stereo

Directional audio is currently accomplished with per-ear propagation delays and direction-dependent amplitude. Convolution with head-related transfer functions (HRTFs) reportedly provides a stronger sense of direction, at the cost of per-source FFT convolution. A fast FFT implementation will be needed, likely employed via the overlap-save method.

Handling of real-time, low latency signals without discontinuities

My project currently uses ad-hoc code to gracefully handle playback of an inconsistent stream of frames. It can handle buffer overflows, underflows and discontinuities (caused by packet drops)
https://github.com/alvr-org/ALVR/blob/master/alvr/audio/src/lib.rs
Now I would like to add support for mixing different audio sources. Oddio seems suited for this task, and at the same time I would use it to refactor and simplify the existing code. The problem is that the Stream signal API seems lacking for my use-case. It seems it cannot gracefully handle interruptions caused by buffer underflows, instead it would stop abruptly causing a “pop”, and there is no way of resuming the stream or detecting when the stream buffer has emptied.

The most sensible solution for me (idea n.1) is to make Stream never return true for is_finished(). Integrate support for a ramp down when the buffer has fewer than N frames, then resume with a rump up when the buffer has been filled enough (halfway?). Optionally support interrupting a ramp down and resume it with a ramp up if frames become available soon enough. Other types of discontinuities such as buffer overflow can be detected by the current API and can be handled with the help of a Fader.

Another option (idea n.2) is to add support for polling when a Stream is going to run out of frames and let the user do a cross-fade with a 0 signal. But actually it would be better if this is handled with a callback. This might be more complex to implement and might not fit right with the current API.

Which option is better? I would be available to make a PR.

On another note, I think the method of resampling used inside Stream might distort the signal, especially high frequencies.

Spatial reverb

Reverb is intrinsically dependent on spatial data, so this probably needs to be baked into Spatial.

One effective strategy is to use a "feedback delay network" where sound reflections are streamed into many buffers representing spatial regions or directions that do not rotate with the viewer. Playback then samples from both the original source (direct) and the buffers (indirect), and the buffers loop back into each other continuously to support ∞-order reflections.

For efficiency, only one such buffer network should be allocated, allowing reverb processing to be O(buffers) rather than O(buffers * sources). This will require either some sort of sharing mechanism between Spatial instances, or refactoring of Spatial into an abstraction that itself owns and mixes sources. I'm leaning towards the former in hope of avoiding the complexity inherent in an additional case of Worker-like source ownership, though care will be necessary to support shared mutable state without UB if a user tries to run multiple workers concurrently.

Scene-dependent reverb is also interesting, though potentially complex. For small, hand-authored scenes, a FDN could be defined with buffers at manually-placed points with precomputed interreflections, in the spirit of real-time radiance. This is toil-intensive, however, and scales poorly to large scenes. One interesting possibility is a hierarchy of toroidally addressed buffers (clipmap style) that could be related with real-time geometry queries. Initial implementation should focus on something much simpler, but there's fertile ground for exploration, perhaps motivating making the whole reverb pipeline pluggable to support application-layer experimentation.

A promising reference: https://signalsmith-audio.co.uk/writing/2021/lets-write-a-reverb/

SMPTE ST 2098 support

This is a cinema standard for object-based (i.e. mobile point source) 3D audio. It might be fun to provide tools for playing back such data streams, if they actually exist in the wild.

	let mem = alloc::alloc(layout);
	mem.cast::<Header>().write(Header {
	read: AtomicUsize::new(0),
	write: AtomicUsize::new(0),
	});
	Box::from_raw(ptr::slice_from_raw_parts_mut(mem, capacity) as *mut Self).into()

	if signal.is_stopped() {
	this.set.remove(i);
	continue;
	}
	if signal.is_paused() {
	continue;
	}

	let distance = norm(prev_position.into());
	let remaining = stop.remaining() + distance / SPEED_OF_SOUND;
	if remaining <= 0.0 {
	stop.stop();
	}

	fn sample(&self, interval: f32, out: &mut [T::Frame]) {
	self.inner.sample(interval, out);
	let shared = f32::from_bits(self.shared.load(Ordering::Relaxed));
	let mut gain = self.gain.borrow_mut();
	if gain.get() != shared {
	gain.set(shared);
	}
	for x in out {
	*x = frame::scale(x, gain.get());
	gain.advance(interval / SMOOTHING_PERIOD);
	}
	}