Hi, I'm looking the code and trying to guess where to change to reduce the bitrate / c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Lyra bitrate is way too high for a vocoder. How to reduce the bitrate? about lyra HOT 18 CLOSED

google commented on August 16, 2024 1

Lyra bitrate is way too high for a vocoder. How to reduce the bitrate?

from lyra.

Comments (18)

trholding commented on August 16, 2024 3

I also had a use case for hf. I got it down to 1600 bps with okay voice, 1040 bps - recognizable, and 800 bps - difficult to understand but better than nothing. Will explain when I find time - still doing research and still trying to understand neural codecs. I think at 1600 bps sounds much better than codec2 at the highest bps.

from lyra.

trholding commented on August 16, 2024 3

@rafael2k I think google builds stuff for the general public and more for their business case. Narrow band is used by few people like HAMs and IOT projects - so we can't blame the engineers. Yes this could be a "Won't fix". As I understand at google, different parts of software projects are owned by other teams, so probably a request here won't help.

@aluebs In future, the narrow band use case could have a high impact on the lives of millions if not billions of people without proper coverage or connectivity. So I highly encourage google to support the narrow band use case as well as open up the training pipeline and make lyra2 a rfc standard of sorts.

Also encourage you to make important projects available with a cmake / make build system which is sort of the external standard instead of google internal bazel only. I do wish that you brilliant engineers would pester management to open up projects/sources/pipelines such as of lyra training that could be extremely useful for the whole of human kind.

It is deceiving when your marketing team called lyra "opensource" but a key component ie the training pipeline is closed source. Disappointed with that.

There is a opportunity to reduce 5% - 20% network bandwidth by optimizing lyra's bit stream. I experimentally verified it, however since the training is not open source, I do not have an incentive to share my findings. But then you can figure it out and implement the improvement by just looking at the bitstream.

I highly appreciate the work your team has done - lyra2 is brilliant, I learned something from it, thank you for that.

from lyra.

rafael2k commented on August 16, 2024 2

Thanks! I guessed the main use case was VoIP, makes sense. I'll study a bit more how the code works and how the NN is set and trained.

/* And congratulation for the great work! A great contribution to the society. */

EDIT: Google did not not opened the training pipeline, and the material used to train the NN seems also closed. The propaganda about Lyra quality does not match the quality of published software + model (it is not even possible to know if what is available in google blogs is something fake or real).

from lyra.

trholding commented on August 16, 2024 2

@rafael2k I am aware of the 450bps, codec2, it didn't cut for my use case, but that's great work.

I have a feeling that the quality of end results depends on pre-processing audio (I had better results with filtered, denoised and companded input audio for example) and use of the right kind of models ie models that are trained on the kind of signal degradation encountered in the field as well as the type of nn + model (lyra is more forgiving).

The biggest challenge is to be able to run in real time or faster on cpu and cpu constrained devices, a bit of opencl is okay I guess as most SoCs come with gpus.

Let's close this, as the subject is becoming less about lyra, let's move this conversation to your git or mine? trholding/audiosamples#1

@aluebs Thanks for the great work and pointers.

from lyra.

rafael2k commented on August 16, 2024 1

At 1600 bps does it sound better than LPCNet (which is also 1600 bps)? Compare to codec2 is not a fair comparison, as it is not ML-based.

from lyra.

trholding commented on August 16, 2024 1

@rafael2k I have attached two files. Ground truth and encoded sound converted to mp3.

https://github.com/trholding/audiosamples/blob/master/1600bps_GT.zip

I've been changing stuff a bit more - so the current output at 1600bps which I have attached sucks a bit. But I do believe that lyra project itself won't go down with bps as this is no longer Voice chat quality.

I'll check the LPCNet output... I've had some outputs in /tmp but I cleared them out. If I remember correct, encoding was fast, decode not so. Will convert check and get back.

Yeah one can't compare it with codec2 which is awesome, my hope is to encourage @drowe67 to consider creating codec3.

I think if I go way down with bps even with degraded output and pass it to a neural decoder that is trained on degraded (ham transmission, extreme noisy audio etc) and clean audio, we could have good speech at low very low bps, that could probably be sent over ?FT8 transatlantic or ?EME bounce :).

Real time may be possible... Maybe even voice cloning could help - send voice vectors for the final stage to get the audio output right - a bit complicated, still warping my head around AI / ML stuff. What sucks is every reference is in python and I prefer C/C++

I wish some HAMs could donate hours of their clean speech and speech with noise to do some training on - I must also think of getting sponsored GPU compute cos I heard its quite expensive to train.

I am also working on on the fly codec switching for every few tens of ms / frames based on audio input and output quality - ie variable codec and bitrate (kludge), also playing with a bit of filtering, frequency shifting and checking quality, its a trial and error process. I hope to release something in two weeks, maybe lyra itself could make use some of the tricks.

Our use case is a Signal like App that works over Cubesat links so its not real time wrt voice ie only voice messages.

Can you let me know what latency ie processing delay range is tolerable for your use case? PS. I am not into codec engineering and stuff so none of what I here do is like the supreme Google Engineer quality, doing this cos of our requirements and use case - things can be improved later.

from lyra.

rafael2k commented on August 16, 2024 1

Thanks! Your sample is really encouraging! Do you have a fork with your advances?
I uploaded the same using LPCNet (I think some noise reduction filter prior to LPCNet would help):
https://abradig.org.br/vocoder/elon_decoded-lpcnet.mp3

I'm also interested in creating such audio samples data-set for NN training.

from lyra.

aluebs commented on August 16, 2024 1

Just wanted to mention that mp3 is its own codec that will introduce its own artifacts, so it might not be the most objective format to make a comparison.
Regarding your idea of using a neural decoder on codec2 features to improve quality, check out this paper that does exactly that.

from lyra.

trholding commented on August 16, 2024 1

@rafael2k Was that elon_groundtruth.mp3 encoded to lpcnet and decoded, reencoded to mp3?

At the moment I work on local copies. Not cleaned up / messy / experiments in progress so code upload will have to wait for ~2 weeks, I'll also have to figure out how to decouple the lyra part from bazel and have a cmake / make way of buiding. I'll post an update here once its done. And yes samples dataset would be awesome, maybe post on some forums where HAMs hang out? Or use WebSDRs but then I don't like the idea of recording without consent.

@aluebs Totally agree, but 25 Meg limit on github when I upload via interface - so mp3s only, wavs will follow later. I've read the paper before as it was mentioned on the LPCNet page. Is there a link to code that one can play with?

@rafael2k Here are the old samples along with the new lpcnet coaxed to 800 bps sample.

https://github.com/trholding/audiosamples/blob/master/elon_groundtruth.mp3 --> ground truth (an elon interview 35:54s)
https://github.com/trholding/audiosamples/blob/master/elon_decoded.mp3 --> Lyra v2 coaxed to 1600bits
https://github.com/trholding/audiosamples/blob/master/elon_lpc.mp3 --> LPCNet coaxed to 800 bits

Give it a try and let me know if audio is workable.

from lyra.

aluebs commented on August 16, 2024 1

Unfortunately there was no code release with that paper.

from lyra.

rafael2k commented on August 16, 2024 1

@aluebs, we are using mp3 just for the sake of reducing the raw pcm size. And no, it is not introducing any noticeable distortion.

from lyra.

aluebs commented on August 16, 2024

You are right about the frame rate and bytes per frame. Unfortunately currently there is no way to reduce the bit rate without re-training the vector quantizer.
Our current main use case is VoIP, that is why we chose 3kbps. But I can see the value of lower bit rate for HF radio use cases, so we will have that in mind.

from lyra.