phoboslab / qoa Goto Github PK

View Code? Open in Web Editor NEW

739.0 30.0 40.0 77 KB

The “Quite OK Audio Format” for fast, lossy audio compression

License: MIT License

Makefile 2.83% C 97.17%

qoa's Introduction

QOA - The “Quite OK Audio Format” for fast, lossy audio compression

Single-file MIT licensed library for C/C++

See qoa.h for the documentation and format specification.

More info at: https://qoaformat.org

Audio samples in WAV & QOA format can be found at: https://qoaformat.org/samples/

⚠️ This implementation has not yet been fuzzed. Don't use it with untrusted input.

Alternative Implementations of QOA

pfusik/qoa-fu - Fusion, transpiling to C, C++, C#, D, Java, JavaScript, Python, Swift and TypeScript
qoa-format - JavaScript encoder/decoder
JohannesFriedrich/qoa4R - R
rafaelcaricio/qoaudio - Pure Rust zero-dependency decoder implementation
AuburnSounds/audio-formats - D library, supports QOA
braheezy/goqoa - Go library and CLI tool
HaxelWorks/qoa-python - Python wrapper using cffi

QOA Support in Other Software

Godot Engine - supports compressing WAV files into QOA since 4.3
raylib - supports decoding QOA samples through its raudio module
SerenityOS supports QOA system wide through QOALoader.h
Qmmp - supports decoding QOA samples

qoa's People

Contributors

Stargazers

Watchers

Forkers

nsauzede dosworld pfusik mattdesl johannesfriedrich cyberflamego southdy chocolate42 rafaelcaricio bejgli ebell495 dophist scrambler-llc jdpatdiscord gabchouin rastuszhang tobybear mmnaudio deejaylsp thechampagne helple max19931 sound-linux-more irscript wareya p-i-n haxelworks brugarolas theo77186 vsfteam sysfce2 gerhobbelt 28add11 aftersol kieselsteini rorosan ninadsachania mkzineb braheezy

qoa's Issues

qoa_read_u64 optimizations

https://godbolt.org/z/Ezd47sd5M suggests that the current qoa_read_u64 function produces a surprising amount of x86_64 machine code. Certainly a lot more than "just a mov (and bswap)".

This is possibly because something like the (*p)+5 in (uint64_t)bytes[(*p)+5] << 16 can overflow an unsigned int (but an unsigned int is not a size_t). But I'm just guessing.

Since it's called pretty deep in the qoa_decode_frame loop, this qoa_read_u64 function might be worth optimizing.

`qoaplay` returns garbage output on Windows 10

Compiling with clang qoaplay.c -std=gnu99 -O3 -o qoaplay.exe, playback of any .qoa file results a quick greeting of garbage output (thankfully I had followed the warning and set my headphones low!) and afterwards the player seemingly crashes. I also tried zig cc to similar effect.

I compiled qoaconv and thankfully it converts to and from .qoa normally.

Some audio data seems to trigger a bug in the encoder/decoder

Hi,
I just tried this compression and it works fine on hundred of different files. However on a specific one, the reconstructed sound has bugs in it. If I lower the volume of the source file, the bug does not appear anymore.
Check below the source file (test.WAV) and the reconstructed file with the bugs (raw PCM 16 bits signed mono)

https://kodamo.org/share2cRGh/test.WAV
https://kodamo.org/share2cRGh/reconst.raw

Is it possible that this exact sound triggers some bug in the encode/decode algorithm?

Possible lossy encoder optimization

I discovered this while working on a codec with similar design principles. (Hi!) Specifically, I discovered this after learning about QOA's dynamic slice quantization, and the optimization it performs to the process of brute forcing them.

If you make certain assumptions about the "shape" of the quantization error as you loop over scaling factors, you can skip large chunks of scaling factors entirely. This assumption is that it mostly decreases towards the ideal scaling factor and then increases when going away from it. If it starts increasing, you can skip to the zeroth scaling factor; and if, after skipping to the zeroth scaling factor, it starts increasing rapidly, you can break out of the loop entirely.

Doing this results in a small loss of quality (around 0.05db), but this quality loss can be controlled by adjusting a fudge factor.

wareya@10858af

$ time ./qoaconv.exe "test ultra new.wav" out_old.qoa
test ultra new.wav: channels: 2, samplerate: 44100 hz, samples per channel: 3155783, duration: 71 sec
out.qoa: size: 2489 kb (2549328 bytes) = 278.32 kbit/s, psnr: 46.87 db

real    0m0.384s
user    0m0.343s
sys     0m0.046s

$ time ./qoaconv.exe "test ultra new.wav" out_new.qoa
test ultra new.wav: channels: 2, samplerate: 44100 hz, samples per channel: 3155783, duration: 71 sec
out_new.qoa: size: 2489 kb (2549328 bytes) = 278.32 kbit/s, psnr: 46.82 db

real    0m0.332s
user    0m0.311s
sys     0m0.030s

The fudge factor I chose was n < (slice_len >> 2). If the encoding loop only made it 25% of the way through before breaking, the slice is assumed to have excessively large error.

Test audio attached: test ultra new.zip

char * should be const in qoaplay_open

Adding const before char* removes warning during compilation time:

qoaplay_desc *qoaplay_open(char *path)

Noise Shaping

I've added some very simple noise shaping to the encoder (to the noise_shaping branch). This does not change the decoder or the data format. The noise shaping should help to move quantization noise into the higher, less audible frequencies.

Here's a comparison page with all samples with and without noise shaping: https://phoboslab.org/files/qoa-samples/noiseshaping.html

The difference for some sample is night & day. Listen to 32_triangles-triangle_roll_stereo at 00:43 or 35_glockenspiel_arpegio_melodious_phrase_stereo at 00:39.

However, this noise shaping has an adverse effect for some other samples. I tried to contain it by only applying most of the shaping when our prediction is "bad" anyway. But still, I feel that some samples sound more "crunchy" now. Listen to 21_trumpet_arpegio_melodious_phrase_stereo right at the beginning for instance. Vocals in julien_baker_sprained_ankle and others also seem to have lost a bit of "smoothness".

Maybe someone with better ears (and/or equipment :D) can take a listen? What's the usual strategy here, to adaptively correct for quantization noise?

Undefined behavior in the encoder: error * error overflows int

The calculation error * error is performed using signed 32-bit arithmetic and it overflows for the allegaeon-beasts-and-worms test (probably many others as well).
The C and C++ standards only define unsigned arithmetic overflow while a signed overflow is an undefined behavior.

A possible fix is to change the variable to:

long long error = (sample - reconstructed);

(wrapped with some typedef probably).
or perhaps check if error is in range -sqrt(INT_MAX)..sqrt(INT_MAX) - not sure if it's guaranteed that at least one scale factor will be in range.

[REQ] Windows Binaries x86 & x64

Any chance you could make some Windows binaries available? Github Actions?

Why replicate sample rate and num channels in each frame?

qoa/qoa.h

Line 24 in c545d4e

In a valid QOA file all frames have the same number of channels and the same

I was reading through some of the code here after seeing this project mentioned in the new raylib release. I am curious why num channels and sample rate are present in every frame if they are consistent (and therefore redundant?) in a valid file. Why not move this into the header and save ~4 bytes per frame.

Is this based on wanting 64 bit aligned reads specifically? Sorry just curious, I love how straight forward this is to implement and wish you success!

Converted files have a "beep" at the beginning

Each file has a very loud noise at the beginning of the file once I convert from WAV to QOA and back to WAV.

Steps:

Cloned repo, commented out the qoaplay and MP3/FLAC bits in Makefile
Ran make
Run ./qoaconv input.wav input.qoa && ./qoaconv input.qoa output.wav
output.wav has a noticeable beep, occurs on all tested WAV files

This loud part at the beginning should not be present in the output WAV:

Using a MacBook Pro M1 Max, macOS 12.5.1.

better encoding of initial samples

Initial samples, if not zero, or slowly fading from zero, will be encoded badly.

I have a working encoder implementation that fixes this issue.

If interested I can share link to fix.

One more implementation

QOA decoding is implemented in https://github.com/AuburnSounds/audio-formats
alongside with seeking, also it's a chunked decoding library. Goal is to follow the format if it changes.

Divide by 0 undefined in C99

The C99 spec states that

The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined.

In qoaconv.c L270:

double psnr = 1.0/0.0;

This refuses to compile on some compilers like MSVC.

Changing 1.0/0.0 to the INFINITY macro seems to work fine.

lower bit-rates?

Is it possible for QOA to achieve lower bitrates for speech? like 8kbit/s or 16kbits?

Specification Draft

While there's still some details to discuss (specifically fields in the file & frame headers), I started working on the file format specification. The current draft can be found here:

https://qoaformat.org/qoa-specification-draft-01.pdf

I'm sure I forgot to mention some details and/or need to clarify things. Please let me know!

A Makefile for the audio samples

While convert_wav.sh does the job of converting all files, I wanted additional features for testing purpose (e.g. make check). Here is my suggested Makefile (which uses the QOA's Makefile) with the script that generates dependencies (gen_deps.sh):

Makefile:

-include deps.mk

all: $(QOA_FILES) $(WAV_FILES)

../qoaconv:
	make -C .. conv

md5: qoa.md5 wav.md5

qoa.md5:
	md5sum */qoa/*.qoa > $@

wav.md5:
	md5sum */qoa_wav/*.qoa.wav > $@

check: rebuild_conv all
	md5sum --quiet -c qoa.md5
	md5sum --quiet -c wav.md5

rebuild_conv:
	make -C .. conv

ifndef MAKE_RESTARTS
deps.mk: .UPDATE_DEPS
	./gen_deps.sh > $@

.PHONY: .UPDATE_DEPS
.UPDATE_DEPS:
endif

clean:
	rm -f $(QOA_FILES) $(WAV_FILES)
	make -C .. $@
	rm -f deps.mk

gen_deps.sh:

#!/bin/bash

for file_type in QOA,qoa,qoa  WAV,qoa_wav,qoa.wav; do
	PREFIX=$(echo $file_type | cut -d, -f1)
	SUBDIR=$(echo $file_type | cut -d, -f2)
	EXTENS=$(echo $file_type | cut -d, -f3)

	# Create Makefile list of file names
	echo "${PREFIX}_FILES= \\"
	for directory in */; do
		dir_only=$(echo "$directory" | sed 's-/*$--g')

		for wav_orig in "$dir_only"/*.wav; do
			name=$(basename "${wav_orig%.*}")

			# Add file name to the Makefile list
			echo -e "\t$dir_only/$SUBDIR/$name.$EXTENS \\"
		done
	done
	echo  # Empty line
done

QOACONV=../qoaconv

for directory in */; do
	dir_only=$(echo "$directory" | sed 's-/*$--g')

	# Create the subdirectories
	mkdir -p "$dir_only"/qoa
	mkdir -p "$dir_only"/qoa_wav

	# Create Makefile rules
	echo "$dir_only"/qoa/%.qoa: "$dir_only"/%.wav $QOACONV
	echo -e "\t$QOACONV" '$< $@\n'

	echo "$dir_only"/qoa_wav/%.qoa.wav: "$dir_only"/qoa/%.qoa $QOACONV
	echo -e "\t$QOACONV" '$< $@\n'
done

Common Lisp implementation

Hello! I've completed an implementation of the format in Common Lisp:

https://shinmera.github.io/cl-qoa/

and aim to use it in our game engine, Trial

Quantizing weights & history samples for the frame header causes audible clicks

As pointed out on HN there are some audible periodic clicks, particularly in the 24_tuba_arpegio_melodious_phrase_stereo test sample. As I have now confirmed, this is indeed happening because of the quantization of weights and history samples for the frame headers here.

Currently, the quantization is just history[i] >> 8. These clicks go away when quantizing with something that remains more accurate at the lower end (e.g. sqrt(weights[i]). However, I think the added complexity of having sqrt or some kind of exponent + mantissa representation that fits into 8 bits there is not worth it.

Or is there some simple way to quantize 16 bit ints into 8 bit with something that approximates sqrt()?

Anyway, my current plan is to just store the LMS state unquantized. That is: 16 bit for each weight and history sample. This increases the bitrate to 278kbits/s (44100hz, stereo). I would also change the layout of the frame header, so that the weights and history samples can be stored/loaded as 64bits each. So instead of:

struct {
	struct {
		int8_t history;    // quantized to 8 bits
		int8_t weight;     // quantized to 8 bits
	} lms_entry[4];
} lms_state[num_channels];

we would have

struct {
	int16_t history[4];
	int16_t weights[4];
} lms_state[num_channels];

Thoughts?

qoa_test_samples doesn't match the code

I'm testing my encoder and I noticed the sample archive doesn't match what the reference encoder produces:

C:\0\ci\qoa\qoa>git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

C:\0\ci\qoa\qoa>git rev-parse HEAD
2d74551f8c5b2cb8c94fe68959dbe1cf9977a793

C:\0\ci\qoa\qoa>cc -O3 -o qoaconv.exe qoaconv.c

C:\0\ci\qoa\qoa>cd ..\test

C:\0\ci\qoa\test>unzip ../qoa_test_samples.zip bandcamp/allegaeon-beasts-and-worms.wav bandcamp/qoa/allegaeon-beasts-and-worms.qoa
Archive:  ../qoa_test_samples.zip
  inflating: bandcamp/allegaeon-beasts-and-worms.wav
  inflating: bandcamp/qoa/allegaeon-beasts-and-worms.qoa

C:\0\ci\qoa\test>..\qoa\qoaconv bandcamp/allegaeon-beasts-and-worms.wav allegaeon-beasts-and-worms.qoa
bandcamp/allegaeon-beasts-and-worms.wav: channels: 2, samplerate: 44100 hz, samples per channel: 6712438, duration: 152 sec
allegaeon-beasts-and-worms.qoa: size: 5295 kb (5422440 bytes) = 278.32 kbit/s, psnr: 39.44 db

C:\0\ci\qoa\test>md5sum ../qoa_test_samples.zip bandcamp/allegaeon-beasts-and-worms.wav bandcamp/qoa/allegaeon-beasts-and-worms.qoa allegaeon-beasts-and-worms.qoa
9b6ae38bb2980466c73b7453e638aa84 *../qoa_test_samples.zip
98b16dfc3041f2d6384c1e378bee72aa *bandcamp/allegaeon-beasts-and-worms.wav
50e5121fb63bc93553a963b9b221ac05 *bandcamp/qoa/allegaeon-beasts-and-worms.qoa
5508a8f6194af9e046873e44a552674f *allegaeon-beasts-and-worms.qoa

C:\0\ci\qoa\test>fc /b bandcamp\qoa\allegaeon-beasts-and-worms.qoa allegaeon-beasts-and-worms.qoa | head
Comparing files BANDCAMP\QOA\allegaeon-beasts-and-worms.qoa and ALLEGAEON-BEASTS-AND-WORMS.QOA
00275B00: F7 0F
00275B01: DC F6
00275B02: 50 DB
00275B03: BB 7F
00275B04: 0D B6
00275B05: AD DF
00275B06: 2D FD
00275B07: C1 B7
00275B10: C3 FF

How does the 1-bit variant work?

Hi, how does QOA work at 1-bit? Does it only store the residual at 1-bit? Or Something else?

last slice encoded with qoaenc is not padded to size 8

This causes segmentation fault if build with -fsanitize=address

So many qoaenc encoded files are invalid in last frame.

Workaround is to just set missing bytes to 0 as they are not actually used in decoding, but this complicates code path and ultimately breaks optimizations, and makes sanity checks within decoder less straightforward.

Mention SerenityOS QOA loader

SerenityOS now has a system-wide QOA loader (SerenityOS/serenity#17512), which could be mentioned in the README list.

I'm creating this as an issue and not as a PR because based on the experience with QOI, I'm not sure what exact format you want for the list here :^)

Fixed bit-rates?

Hi, for the 1-bit version of QOA, is it possible to have a fixed bitrate of 8kbps?

Frame header: frame size is calculable and can be used for other data, suggestion: improved seekability

The 16 bits in use to communicate the frame size are not necessary, since the first 8 bits of each frame's header contains the number of channels and from that you can calculate the total frame size, because all arrays are only dependent on the number of channels:

sizeof(frame_header) + num_channels * (sizeof(lms_state) + sizeof(qoa_slice_t) * 256)

By dropping the frame size bits, and by extension its value limit of an unsigned 16-bit int, we could theoretically also drop this channel limit:

#define QOA_MAX_CHANNELS 8

As a suggestion, I would propose to include more metadata to improve seekability through frames:

As it stands, each frame can change the number of channels and/or sample rate which means that you need to read each frame header to be able to seek through an audio stream. Even if the number of channels remains constant, you can only seek to certain sample offsets and cannot seek to certain timestamps or even calculate the timestamp after seeking without decoding all frames in between.

If we use the free 16 bits to encode additional metadata, we can include something like this in the frame header

bits 0123456789abcdef
     tvvvvvvvvvvvvvvv
      
t = 0
    you need to decode all frames
t = 1
    the value `v` (15 bits) indicates the number of following frames that do not deviate in number of channels or sample rate

For live streaming you would use t=0 but for streams encoded ahead of time, you could set t=1 and set the number of frames for which the encoder is certain that the number of channels or sample rate isn't going to change. Even if the encoder isn't sure ahead of time, if the encoder's output is seekable it could write the correct value after the fact.

This way, you only need to decode the first frame to be able to seek to any timestamp as well.

"raw-audio only" encoder & decoder programs?

Hi, Is it possible we could see a very simplistic encoder/decoder program that only reads/writes raw audio files?

QOA applications in pipewire

Not really an issue, more of a discussion and sharing of my ideas.

I've been wondering if QOA might be a good codec for implementing pulseaudio/pipewire's audio over the network. I've played with those in the past, but once you get to 6 channels over wifi it gets laggy. Maybe a more efficient codec like QOA would help.

Before I try to convince any pipewire folks that we should use a lossy codec, I wanted to see what it would sound like. For this I created a jack client that takes its inputs and sends it to the outputs (so i can use it between any of my apps and physical speakers) but with the slight difference that it encodes stuff to QOA then immediatelly decodes it. If there are any artifacts one should be able to hear it, live, using this qoa_quality_test jack client. If nothing more, this is a pretty simple example on how to hookup qoa to a jack client.

Since this is streaming audio, I'm using only the lower level functions (eg: only qoa_{encode_decode}_frame functions without using qoa_{encode,decode}). I think it might be useful to have the default qoa lms initing in some kind of common function so other users can call them too (just like me).

What's the justification for using big-endian?

Little-endian seems to make more sense IMO. Every common architecture uses little endian (x86/arm/risc-v) meaning using big-endian forces the vast majority of hardware in use to do a bswap. Not an expensive operation but more expensive than not having to do anything at all.

4 bits per sample

Hello. The quality of QOA is not very good at lower sampling rates, can anyone make a 4bit version of QOA?

PCM float planar to qoa format

Is there a way to convert planar audio data (float 32 : LLLLLL.......RRRRRR) to qoa format with minimum data loss?

Increase total sample count limit

32 bits is sufficient for typical use, but it could be a limitation for more exotic use cases (like storing MF/HF radio signals, and anything else that may have large time and/or frequency). Instead of anything that increases the size of file_header I propose this simple change:

struct {
		char     magic[3];         // magic bytes 'qoa'
		uint40_t samples;          // number of samples per channel in this file
	} file_header;                 // = 64 bits

Doing this would match wavpack's 40 bit total sample count limit. It would be hard for an exotic use case to be so exotic that 40 bits is a limitation.

Remove dependence on 64-bit integers (for ANSI-C Compilers without a 64 bit integer type)

The current implementation is dependent on 64-bit integers. This does not work on many (old) ANSI-C compilers that don't have a 64-bit integer type such as long long. I have seen that the format itself does not need 64-bit integers at all. Is it possible to make the implementation independent of 64-bit ints and therefore much more compatible with a wide range of systems?