Coder Social home page Coder Social logo

Comments (18)

trholding avatar trholding commented on August 16, 2024 3

I also had a use case for hf. I got it down to 1600 bps with okay voice, 1040 bps - recognizable, and 800 bps - difficult to understand but better than nothing. Will explain when I find time - still doing research and still trying to understand neural codecs. I think at 1600 bps sounds much better than codec2 at the highest bps.

from lyra.

trholding avatar trholding commented on August 16, 2024 3

@rafael2k I think google builds stuff for the general public and more for their business case. Narrow band is used by few people like HAMs and IOT projects - so we can't blame the engineers. Yes this could be a "Won't fix". As I understand at google, different parts of software projects are owned by other teams, so probably a request here won't help.

@aluebs In future, the narrow band use case could have a high impact on the lives of millions if not billions of people without proper coverage or connectivity. So I highly encourage google to support the narrow band use case as well as open up the training pipeline and make lyra2 a rfc standard of sorts.

Also encourage you to make important projects available with a cmake / make build system which is sort of the external standard instead of google internal bazel only. I do wish that you brilliant engineers would pester management to open up projects/sources/pipelines such as of lyra training that could be extremely useful for the whole of human kind.

It is deceiving when your marketing team called lyra "opensource" but a key component ie the training pipeline is closed source. Disappointed with that.

There is a opportunity to reduce 5% - 20% network bandwidth by optimizing lyra's bit stream. I experimentally verified it, however since the training is not open source, I do not have an incentive to share my findings. But then you can figure it out and implement the improvement by just looking at the bitstream.

I highly appreciate the work your team has done - lyra2 is brilliant, I learned something from it, thank you for that.

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024 2

Thanks! I guessed the main use case was VoIP, makes sense. I'll study a bit more how the code works and how the NN is set and trained.

/* And congratulation for the great work! A great contribution to the society. */

EDIT: Google did not not opened the training pipeline, and the material used to train the NN seems also closed. The propaganda about Lyra quality does not match the quality of published software + model (it is not even possible to know if what is available in google blogs is something fake or real).

from lyra.

trholding avatar trholding commented on August 16, 2024 2

@rafael2k I am aware of the 450bps, codec2, it didn't cut for my use case, but that's great work.

I have a feeling that the quality of end results depends on pre-processing audio (I had better results with filtered, denoised and companded input audio for example) and use of the right kind of models ie models that are trained on the kind of signal degradation encountered in the field as well as the type of nn + model (lyra is more forgiving).

The biggest challenge is to be able to run in real time or faster on cpu and cpu constrained devices, a bit of opencl is okay I guess as most SoCs come with gpus.

Let's close this, as the subject is becoming less about lyra, let's move this conversation to your git or mine? trholding/audiosamples#1

@aluebs Thanks for the great work and pointers.

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024 1

At 1600 bps does it sound better than LPCNet (which is also 1600 bps)? Compare to codec2 is not a fair comparison, as it is not ML-based.

from lyra.

trholding avatar trholding commented on August 16, 2024 1

@rafael2k I have attached two files. Ground truth and encoded sound converted to mp3.

https://github.com/trholding/audiosamples/blob/master/1600bps_GT.zip

I've been changing stuff a bit more - so the current output at 1600bps which I have attached sucks a bit. But I do believe that lyra project itself won't go down with bps as this is no longer Voice chat quality.

I'll check the LPCNet output... I've had some outputs in /tmp but I cleared them out. If I remember correct, encoding was fast, decode not so. Will convert check and get back.

Yeah one can't compare it with codec2 which is awesome, my hope is to encourage @drowe67 to consider creating codec3.

I think if I go way down with bps even with degraded output and pass it to a neural decoder that is trained on degraded (ham transmission, extreme noisy audio etc) and clean audio, we could have good speech at low very low bps, that could probably be sent over ?FT8 transatlantic or ?EME bounce :).

Real time may be possible... Maybe even voice cloning could help - send voice vectors for the final stage to get the audio output right - a bit complicated, still warping my head around AI / ML stuff. What sucks is every reference is in python and I prefer C/C++

I wish some HAMs could donate hours of their clean speech and speech with noise to do some training on - I must also think of getting sponsored GPU compute cos I heard its quite expensive to train.

I am also working on on the fly codec switching for every few tens of ms / frames based on audio input and output quality - ie variable codec and bitrate (kludge), also playing with a bit of filtering, frequency shifting and checking quality, its a trial and error process. I hope to release something in two weeks, maybe lyra itself could make use some of the tricks.

Our use case is a Signal like App that works over Cubesat links so its not real time wrt voice ie only voice messages.

Can you let me know what latency ie processing delay range is tolerable for your use case? PS. I am not into codec engineering and stuff so none of what I here do is like the supreme Google Engineer quality, doing this cos of our requirements and use case - things can be improved later.

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024 1

Thanks! Your sample is really encouraging! Do you have a fork with your advances?
I uploaded the same using LPCNet (I think some noise reduction filter prior to LPCNet would help):
https://abradig.org.br/vocoder/elon_decoded-lpcnet.mp3

I'm also interested in creating such audio samples data-set for NN training.

from lyra.

aluebs avatar aluebs commented on August 16, 2024 1

Just wanted to mention that mp3 is its own codec that will introduce its own artifacts, so it might not be the most objective format to make a comparison.
Regarding your idea of using a neural decoder on codec2 features to improve quality, check out this paper that does exactly that.

from lyra.

trholding avatar trholding commented on August 16, 2024 1

@rafael2k Was that elon_groundtruth.mp3 encoded to lpcnet and decoded, reencoded to mp3?

At the moment I work on local copies. Not cleaned up / messy / experiments in progress so code upload will have to wait for ~2 weeks, I'll also have to figure out how to decouple the lyra part from bazel and have a cmake / make way of buiding. I'll post an update here once its done. And yes samples dataset would be awesome, maybe post on some forums where HAMs hang out? Or use WebSDRs but then I don't like the idea of recording without consent.

@aluebs Totally agree, but 25 Meg limit on github when I upload via interface - so mp3s only, wavs will follow later. I've read the paper before as it was mentioned on the LPCNet page. Is there a link to code that one can play with?

@rafael2k Here are the old samples along with the new lpcnet coaxed to 800 bps sample.

https://github.com/trholding/audiosamples/blob/master/elon_groundtruth.mp3 --> ground truth (an elon interview 35:54s)
https://github.com/trholding/audiosamples/blob/master/elon_decoded.mp3 --> Lyra v2 coaxed to 1600bits
https://github.com/trholding/audiosamples/blob/master/elon_lpc.mp3 --> LPCNet coaxed to 800 bits

Give it a try and let me know if audio is workable.

from lyra.

aluebs avatar aluebs commented on August 16, 2024 1

Unfortunately there was no code release with that paper.

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024 1

@aluebs, we are using mp3 just for the sake of reducing the raw pcm size. And no, it is not introducing any noticeable distortion.

from lyra.

aluebs avatar aluebs commented on August 16, 2024

You are right about the frame rate and bytes per frame. Unfortunately currently there is no way to reduce the bit rate without re-training the vector quantizer.
Our current main use case is VoIP, that is why we chose 3kbps. But I can see the value of lower bit rate for HF radio use cases, so we will have that in mind.

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024

@trholding, I listened the samples, and manage to do my own tests with LPC at rates bellow 1kbps with David Rowe's LPCNet branch. It does work. The voices seems like from a drunken one but is definitely understandable. I would say "communications quality".
Lyra v2 at 1.6 kbps indeed is a step forward and seems better than LPCNet! I need to do more subjective tests to be sure, but anyway, this is really great!

I was able to test the Fraunhofer NESC codec at 1.6 kbps, and from (mine) subjective analysis, it is a big step forward, being much more robust to noise than LPCNet, while keeping the bitrate low and high quality.
Here is the NESC paper:
https://arxiv.org/pdf/2207.03282.pdf
There is no open source code, but some samples here for the 3 kbps version:
https://fhgspco.github.io/nesc/

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024

Just wanted to mention that mp3 is its own codec that will introduce its own artifacts, so it might not be the most objective format to make a comparison. Regarding your idea of using a neural decoder on codec2 features to improve quality, check out this paper that does exactly that.

Btw, just remembering the creation of the 450 bps codec2 mode:
https://ieeexplore.ieee.org/abstract/document/8910691
https://www.rowetel.com/wordpress/?p=6212

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024

@rafael2k Was that elon_groundtruth.mp3 encoded to lpcnet and decoded, reencoded to mp3?

Forgot to anwser - Yes.

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024

I think we should close this as "Won't fix", as Lyra training pipeline is closed source as far as I know, and Google does not give a sh** to narrow band use case. Please advise if I'm wrong.

from lyra.

aluebs avatar aluebs commented on August 16, 2024

Thank you for the thoughtful and specific feedback. I can empathize with your frustration and I wish I had more satisfactory answers, since we are all quite passionate about contributing to the open-source community.
There are good arguments for reducing Lyra's bitrate even further and opening up the training pipeline, but unfortunately at this point these aren't the team's priorities.

from lyra.

rafael2k avatar rafael2k commented on August 16, 2024

So I'm closing this issue I opened some time ago as "Google does not give a sh*t" and "training pipeline is closed source even if google says in the media Lyra is open source".

from lyra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.