Coder Social home page Coder Social logo

wavers's Introduction

WaveRS - A Wav File Reader/Writer

Crates.io Documentation crate-pywavers-img crate-pywavers-docrs-img crate-pywavers-readdocs-img

WaveRs (pronounced wavers) is a Wav file reader/writer written in Rust and designed to fast and easy to use. WaveRs is also available in Python through the PyWaveRs package.


Getting Started

This sections provides a quick overview of the functionality offered by WaveRs to help you get started quickly. WaveRs allows the user to read, write and perform conversions between different types of sampled audio, currently, i16, i32, f32 and f64. There is now experimental support fori24 now.

For more details on the project and wav files see the WaveRs Project section below. For more detailed information on the functionality offered by WaveRs see the the docs.

Reading

use wavers::{Wav, read};
use std::path::Path;

fn main() {
	let fp = "path/to/wav.wav";
    // creates a Wav file struct, does not read the audio data. Just the header information.
    let wav: Wav<i16> = Wav::from_path(fp).unwrap();
	let samples: Samples<i16> = wav.read().unwrap();
    // or to read the audio data directly
    let (samples, sample_rate): (Samples<i16>, i32) = read::<i16, _>(fp).unwrap();
    // samples can be derefed to a slice of samples
    let samples: &[i16] = &samples;
}

Conversion

use wavers::{Wav, read, ConvertTo};
use std::path::Path;

fn main() {
    // Two ways of converted a wav file
    let fp: "./path/to/i16_encoded_wav.wav";
    let wav: Wav<f32> = Wav::from_path(fp).unwrap();
    // conversion happens automatically when you read
    let samples: &[f32] = &wav.read().unwrap();

    // or read and then call the convert function on the samples.
    let (samples, sample_rate): (Samples<i16>, i32) = read::<i16, _>(fp).unwrap();
    let samples: &[f32] = &samples.convert();
}

Writing

use wavers::Wav;
use std::path::Path;

fn main() {
	let fp: &Path = &Path::new("path/to/wav.wav");
	let out_fp: &Path = &Path::new("out/path/to/wav.wav");

	// two main ways, read and write as the type when reading
    let wav: Wav<i16> = Wav::from_path(fp).unwrap();
    wav.write(out_fp).unwrap();

	// or read, convert, and write
    let (samples, sample_rate): (Samples<i16>,i32) = read::<i16, _>(fp).unwrap();
    let sample_rate = wav.sample_rate();
    let n_channels = wav.n_channels();

    let samples: &[f32] = &samples.convert();
    write(out_fp, samples, sample_rate, n_channels).unwrap();
}

Iteration

WaveRs provides two primary methods of iteration: Frame-wise and Channel-wise. These can be performed using the Wav::frames and Wav::channels functions respectively. Both methods return an iterator over the samples in the wav file. The frames method returns an iterator over the frames of the wav file, where a frame is a single sample from each channel. The channels method returns an iterator over the channels of the wav file, where a channel is all the samples for a single channel.

use wavers::Wav;

fn main() {
    let wav = Wav::from_path("path/to/two_channel.wav").unwrap();
    for frame in wav.frames() {
        assert_eq!(frame.len(), 2, "The frame should have two samples since the wav file has two channels");
        // do something with the frame
    }

    for channel in wav.channels() {
        // do something with the channel
        assert_eq!(channel.len(), wav.n_samples() / wav.n_channels(), "The channel should have the same number of samples as the wav file divided by the number of channels");
    }
}

Wav Utilities

use wavers::wav_spec;

fn main() {
 	let fp = "path/to/wav.wav";
    let wav: Wav<i16> = Wav::from_path(fp).unwrap();
    let sample_rate = wav.sample_rate();
    let n_channels = wav.n_channels();
    let duration = wav.duration();
    let encoding = wav.encoding();
    let (duration, header) = wav_spec(fp).unwrap();
}

Check out wav_inspect for a simnple command line tool to inspect the headers of wav files.

Features

The following section describes the features available in the WaveRs crate.

Ndarray

The ndarray feature is used to provide functions that allow wav files to be read as ndarray 2-D arrays (samples x channels). There are two functions provided, into_ndarray and as_ndarray. Both functions create an Array2 from the samples and return it alongside the sample rate of the audio. into_ndarray consume the input Wav struct and as_ndarray does not.

use wavers::{read, Wav, AsNdarray, IntoNdarray};
use ndarray::{Array2, CowArray2};

fn main() {
	let fp = "path/to/wav.wav";
    let wav: Wav<i16> = Wav::from_path(fp).unwrap();

    // does not consume the wav file struct
	let (i16_array, sample_rate): (Array2<i16>, i32) = wav.as_ndarray().unwrap();
    
   	// consumes the wav file struct
	let (i16_array, sample_rate): (Array2<i16>, i32) = wav.into_ndarray().unwrap();

    // convert the array to samples.
    let samples: Samples<i16> = Samples::from(i16_array);
}

PyO3

The pyo3 feature is used to provide interoperabilty with the Python, specifically, the PyWavers project (Use Wavers in Python).

The WaveRs Project

There were several motivating factors when deciding to write Wavers. Firstly, my PhD involves quite a bit of audio processing and I have been working almost exclusively with wav files using Python. Python is a fantastic language but I have always had issues with aspects such as baseline memory usage. Secondly, after being interested in learning a more low-level language and not being bothered with the likes of C/C++ (again), Rust caught my attention. Thirdly, I had to do a Speech and Audio processing module, which involved a project. Mixing all of these together led me to start this project and gave me a deadline and goals for an MVP of Wavers.

Rust also has a limited number of audio-processing libraries and even fewer specific to the wav format. Currently, the most popular wav reader/writer crate is hound, by ruuda. Hound is currently used as the wav file reader/writer for other Rust audio libraries such as Rodio, but Hound was last updated in September 2022. The biggest general-purpose audio-processing library for Rust though is CPAL. CPAL, the Cross-Platform Audio Library is a low-level library for audio input and output in pure Rust. Finally, there is also Symphonia, another general-purpose media encoder/decoder crate for Rust. Symphonia supports the reading and writing of wav files. Both CPAL and Symphonia are very low level and do not offer the ease of use that WaveRs strives for.

Project Goals

Anatomy of Wav File

The wave file format is a widely supported format for storing digital audio. A wave file uses the Resource Interchange File Format (RIFF) file structure and organizes the data in the file in chunks. Each chunk contains information about its type and size and can easily be skipped by software that does not understand the specific chunk type.

Excluding the RIFF chunk there is no guaranteed order to the remaining chunks and only the fmt and data chunk are guaranteed to exist. This means that rather than a specific structure for decoding chunks, chunks must be discovered by seeking through the wav file and reading the chunk ID and chunk size fields. Then if the chunk is needed, read the chunk according to the chunk format, and if not, skip ahead by the size of the chunk.

The chunk types supported and used by Wavers are described below. There are plans to expand the number of supported chunks as time goes on and the library matures.

RIFF

Total Bytes = 8 + 4 for the Chunk name + the size of the rest of the file less 8 bytes for the Chunk ID and size.

Byte Sequence Description Length(Bytes) Offset(Bytes) Description
Chunk ID 4 0 The character string "RIFF"
Size 4 4 The size, in bytes of the chunk
RIFF type ID 4 8 The character string "WAVE"
Chunks $x$ 12 The other chunks

fmt

Total bytes = 16 + 4 for the chunk size + 4 for the chunk name

Byte Sequence Description Length(Bytes) Offset(Bytes) Description
Chunk ID 4 0 The character string "fmt " (note the space!)
Size 4 4 The size, in bytes of the chunk
Compression Code 2 6 Indicates the format of the wav file, e.g. PCM = 1 and IEEE Float = 3
Number of Channels 2 8 The number of channels in the wav file
Sampling Rate 4 12 The rate at which the audio is sampled
Byte Rate 4 16 Bytes per second
Block Align 2 18 Minimum atomic unit of data
Bits per Sample 2 20 The number of bits per sample, e.g. PCM_16 has 16 bits

Data Chunk

Total bytes = 4 for the chunk name + 4 for the size and then $x$ bytes for the data.

Byte Sequence Description Length(Bytes) Offset(Bytes) Description
Chunk ID 4 0 The character string "data"
size 4 4 The size of the data in bytes
data $x$ 8 The encoded audio data

Benchmarks

The benchmarks below were recorded using Criterion and each benchmark was run on a small dataset of wav files taken from the GenSpeech data set. The durations vary between approx. 7s and 15s and each file is encoded as PCM_16. The results below are the time taken to load all the wav files in the data set. So the time per file is the total time divided by the number of files in the data set. The data set contains 10 files. There are some suspected anomalies in the benchmarks which warrant further investigation. The benchmarks were run on a desktop PC with the following (relevant) specs:

  • CPU: 13th Gen Intel® Core™ i7-13700KF
  • RAM: 32Gb DDR4
  • Storage: 1Tb SSD

Hound vs Wavers - native i16

benchmark name min_time mean_time max_time
Hound vs Wavers - native i16 Hound Read i16 7.4417 ms 7.4441 ms 7.4466 ms
Hound vs Wavers - native i16 Wavers Read i16 122.42 µs 122.56 µs 122.72 µs
Hound vs Wavers - native i16 Hound Write i16 2.1900 ms 2.2506 ms 2.3201ms
Hound vs Wavers - native i16 Wavers Write i16 5.9484 ms 6.2091 ms 6.5018 ms

Reading

benchmark name min_time mean_time max_time
Reading Native i16 - Read 121.28 µs 121.36 µs 121.44 µs
Reading Native i16 - Read Wav File 121.56 µs 121.79 µs 122.08 µs
Reading Native i16 As f32 287.63 µs 287.78 µs 287.97 µs

Writing

benchmark name min_time mean_time max_time
Writing Slice - Native i16 5.9484 ms 6.2091 ms 6.5018 ms
Writing Slice - Native i16 As f32 30.271 ms 33.773 ms 37.509 ms
Writing Write native f32 11.286 ms 11.948 ms 12.648 ms

wavers's People

Contributors

jmg049 avatar jackgeraghty avatar jdbosser avatar

Stargazers

Mathieu Duponchelle avatar  avatar  avatar Kartavya Patel avatar Antonio Bernardini avatar Christoph Grabo avatar Sean P. Kelly avatar Cr0a3 avatar  avatar Vincent Yang avatar Philip Van Raalte avatar  avatar Mel Massadian avatar Hwanhee "Asher" Kim avatar  avatar Armeen Mahdian avatar NickAc avatar EmNudge avatar Benno Straub avatar Esteban Gómez avatar Somē Cho avatar Akmal Soliev avatar Bay avatar Prashant Khandelwal avatar Juno Burger avatar Jarod avatar

Watchers

Juno Burger avatar  avatar

wavers's Issues

Docs: Unable to access `ChannelIterator`, `FrameIterator` and `WavInfo` at docs.rs

Hi!

Looking at the docs.rs, I see that there are no links, nor am I able to search for, the structs mentioned in the title of this issue.

Screenshot from 2024-07-30 09-04-19

I have figured out that it is because the sub-modules this crate defines (in src/lib.rs) are not public, even though sub-modules have public functions and are well documented. Is this intentional?

I have naively opened a pr #21, that makes the sub-modules public, which solves the issue. If this is "too" public, maybe there is a middle ground that can be found?

Testing 1#: Load testing

A segfault has appeared when loading multiple files via a for loop, specifically for 32-bit floats and likely 64-bit floats and 32-ints. This was hopefully addressed in this commit by fixing the values used to initialize the sizes and capacity of the underlying buffers. However, due to time limits at the time this has yet to be tested in Python and the root cause determined for sure.

Improve Benchmarking

The benchmarking part of wavers definitely needs some attention. It probably needs to have a set of tests defined that cover a wide range of wav file scenarios, for example, reading and writing various lengths of signals of comparing against hound.

Would also be great to automate the output of the benchmarking as well to some template file or something. This file will be generated and converted into some like like markdown or latex.

List Chunk is Broken

I pushed a non-finalised version of this months ago. Need to get around to finishing it off.

String represention of Wav struct

Currently the Wav struct cannot be printed/formatted. It might not have much to print, but even if it has the some of the wav header and maybe some other items like duration.

ndarray -> Samples?

Hi,

Thanks for this lib.
I'm trying to operate on a wav file as ndarray and then convert back once done to write to a file.
I managed to get the first part correctly but I'm stuck now on how to write it back to a file / cast it as Samples.

By quickly looking at the source I didn't find anything explicit for the other way around.

Is it possible?

Thanks

New README

Much of the Readme file is out-dated as incorrect/inaccurate as of the current version (1.4.3 I think) of wavers

Add more descriptive error messages for WaveType parsing

Currently we just get "Invalid wav type" for anything not within the 4 supported types.
24 bit PCM is really popular in audio engineering contexts, making me a bit confused when none of my files were able to be parsed. I later found out its a planned feature, but it would be nice to get a more descriptive error in the meantime.

Maybe something like

impl TryFrom<(u16, u16)> for WavType {
    type Error = WaversError;

    fn try_from(value: (u16, u16)) -> Result<Self, Self::Error> {
        Ok(match value {
            (1, 16) => WavType::Pcm16,
            (1, 32) => WavType::Pcm32,
            (3, 32) => WavType::Float32,
            (3, 64) => WavType::Float64,
            (1, bits_per_sample) => {
                return Err(WaversError::InvalidType(format!(
                    "No PCM encoding for {} bits per sample",
                    bits_per_sample
                )))
            },
            (3, bits_per_sample) => {
                return Err(WaversError::InvalidType(format!(
                    "No Float encoding for {} bits per sample",
                    bits_per_sample
                )))
            },
            _ => return Err(WaversError::InvalidType("Unsupported encoding type".into())),
        })
    }
}

"Feature may not be used on the release stable channel"

I have Rust 1.73.0 installed on Mac OS. I run "cargo add wavers" and "cargo run".

Cargo prints:

   Compiling wavers v1.0.1
error[E0554]: `#![feature]` may not be used on the stable release channel
 --> /Users/mcc/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wavers-1.0.1/src/lib.rs:1:12
  |
1 | #![feature(const_type_id)]
  |            ^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0554`.
error: could not compile `wavers` (lib) due to previous error

I do not know what that means exactly.

Expected behavior: If something is published in Cargo, it should work in stable Rust, or at minimum there should be clear warnings in the documentation / crates.io page that it requires unstable.

Logging & Improved Errors

Currently there is no logging of any kind going on. Should probably add in some from the likes of log.rs.

Also feel like that the error handling might need a once over at least, even just to make sure it's tidy

Incorrect RIFF Chunk Size in Header of Written WAV Files

I encountered an issue where the RIFF chunk size in the header of written WAV files does not match the expected value of file_size - 8.

Steps to Reproduce:

  1. Create a WAV file using wavers.
  2. Read the file size and the RIFF chunk size from the header.
  3. Compare the RIFF chunk size to file_size - 8.

Expected Behavior:
The RIFF chunk size in the header should be equal to file_size - 8.

Actual Behavior:
The RIFF chunk size does not match the expected value.

Code to Reproduce:

use std::fs::File;
use std::io::{Read, Seek, SeekFrom};
use wavers::{write, Samples};

fn main() {
    let fp = "./wav.wav";
    let sr: i32 = 16000;
    let duration = 10;
    let mut samples: Vec<f32> = (0..sr * duration).map(|x| (x as f32 / sr as f32)).collect();
    for sample in samples.iter_mut() {
        *sample *= 440.0 * 2.0 * std::f32::consts::PI;
        *sample = sample.sin();
        *sample *= i16::MAX as f32;
    }
    let samples: Samples<f32> = Samples::from(samples.into_boxed_slice()).convert();
    assert!(write(fp, &samples, sr, 1).is_ok());

    let mut file = File::open(fp).expect("Failed to open the WAV file");

    let file_size = file.metadata().expect("Failed to get file metadata").len();
    println!("File size: {} bytes", file_size);

    file.seek(SeekFrom::Start(4)).expect("Failed to seek in the file");

    let mut riff_chunk_size_bytes = [0u8; 4];
    file.read_exact(&mut riff_chunk_size_bytes).expect("Failed to read RIFF chunk size");
    let riff_chunk_size = u32::from_le_bytes(riff_chunk_size_bytes);
    println!("RIFF chunk size: {} bytes", riff_chunk_size);

    let expected_riff_chunk_size = file_size as u32 - 8;
    assert_eq!(riff_chunk_size, expected_riff_chunk_size, "RIFF chunk size does not match expected value");
    println!("RIFF chunk size is correct.");
}

n_samples returns wrong number when converting between differently sized sample formats

Reproducible example:

// The wav file behind fp is encoded with s16le sample format and contains
// one channel and 2048 samples. We are converting the samples to `f32`
// on the fly.
let wav: Wav<f32> = Wav::from_path(fp).unwrap();

// This fails!
assert!(wav.n_samples(), 2048);

// This succeeds, so n_samples is returning half the expected number.
assert!(wav.n_samples(), 1024);

I guess in the n_samples function, the total size of the data chunk is divided by the size of one sample after the conversion to f32. Since an f32 takes up twice as many bytes as i16, the outcome is half of the actual number of samples in the file.

Add support for extended Wav format

Originally, Wavers was meant to be as simple as possible, and for the first major version, this meant just the original wav format. However, it was raised in Issue #10, that established tools, such as ffmpeg, produce wav files in the extended format.

Wavers really should be compatible with such tools as people using Wavers are likely to be using them, too.

I have started working locally on this issue. But will publish a wav_extension branch that will track changes. I am basing the implementation around this. I am still looking for a more definitive list of the sub-formats.

Thanks to EmNudge for raising the original issue and providing some great references on it.

Test #2: Test file for each sample format,

Need proper coverage of all functionality. Just would be reassuring to have a set test audio file that would cover each possibility in terms of i16, i32, f32 and f64. Ties in with this benchmark issue and this test issue.

Documentation: How are channels accessed?

Hi!

I was wondering how the different channels are accessed. I have a wav-file containing 56 channels, and I am planning to read this sample per sample for all the channels.

Hence, I want to (pseudo-code)

let mut wav: Wav<f32> = Wav::from_path("test.wav")?;
let samples = wav.read_samples(1);

and would have suspected to get the first sample from the 56 channels, that is, a vector or an array containing 56 elements. The result of the above is that I get one value out.

So, I assume that I need to read 56 samples, if I want to get the values of all channels at the first time instance? Or, is it something else I need to do?

I read in the documentation that channel iteration is a planned feature. However, if the current behavior of read_samples is to remain, I think we could add an additional line in the documentation on the structure of the returned samples in case of multiple channels.

Resampling Feature

Already added rubato as an optional dependency.

Need to figure out what's the best way to do resampling. Mostly where to expose it. An option when reading? Another Wav function call and conversion, consume self and return a new version of the Wav struct with a new header? Make it a standalone function and then pass a Wav into it?

Add Support for Seeking

I've noticed that internally the crate has an advance_pos function on Wav, but this function is not public.

I'm writing code that works with snippets of long wav files, and oftentimes the section that needs to be extracted is near the end of the file. Since the file is potentially massive I want to avoid reading the entire thing into memory, or reading a bunch of unneeded samples. Right now, looking at the public API, two ways of doing this immediately come to mind:

  1. Create a Wav and call read_sample(s) until I'm at the snippet I care about
  2. Create something Box<dyn ReadSeek> myself, seek manually, and then create a Wav using this reader.

Neither of these feel great. (1) will end up doing a ton of allocs and is inefficient. (2) seems hacky, as it's basically rewriting the (part of) the logic of advance_pos outside the crate.

It would be nice if there were a way using the API to seek to a given sample. For example, seek to the 1 millionth sample just by seeking the underlying reader.

Thanks for maintaining WaveRs! Not sure what contributing looks like, but I'd be happy to try and implement this if it seems useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.