Coder Social home page Coder Social logo

Text To Speech Support about macchina HOT 23 CLOSED

macchina-cli avatar macchina-cli commented on September 7, 2024
Text To Speech Support

from macchina.

Comments (23)

uttarayan21 avatar uttarayan21 commented on September 7, 2024 1

Sure no problem ! Good luck !

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024 1

Your bindings work! 🚀 😃

from macchina.

uttarayan21 avatar uttarayan21 commented on September 7, 2024 1

Maybe I should update the dependencies and properly add directions

from macchina.

uttarayan21 avatar uttarayan21 commented on September 7, 2024 1

Then I guess I'll publish it in crates.io
I just published it in crates.io google_speech

from macchina.

uttarayan21 avatar uttarayan21 commented on September 7, 2024

Well I have a library or more specifically rust bindings to a python library. https://github.com/uttarayan21/google_speech_rs that makes it super easy to do tts. But I haven't tested it super extensively.

Also you swapped the () [] around in the link lol

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

Also you swapped the () [] around in the link lol

😆

Well I have a library or more specifically rust bindings to a python library. https://github.com/uttarayan21/google_speech_rs that makes it super easy to do tts. But I haven't tested it super extensively.

I'll be allocating a little bit of time to try and get this to work, and see if we can have this feature out with the next release.

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

Is it on crates.io?

from macchina.

uttarayan21 avatar uttarayan21 commented on September 7, 2024

No I didn't really publish it since I since I didn't test it extensively.

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

I'll just clone and test it locally then, thanks for sharing! 🚀

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

Any dependencies I should be installing? I'm getting a ton of build errors? Ignore this.

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

Google's guide mentions that one should install sox and libsox-fmt-mp3, installing sox by itself and running your example results in:

sox FAIL formats: no handler for given file type `mp3'
PyErr { type: <class 'RuntimeError'>, value: RuntimeError(), traceback: Some(<traceback object at 0x7f00cbe21440>) }

libsox-fmt-mp3 is not available in the official Arch repos or the AUR.

A quick sudo find / -name "*sox*" finds:

...
/usr/lib/sox/libsox_fmt_mp3.so
...

So Arch does package the libraries required by google's speech python library, but unfortunately it doesn't work.

from macchina.

uttarayan21 avatar uttarayan21 commented on September 7, 2024

Optional Deps : libao: for ao plugin
libmad: for mp3 plugin [installed]
libid3tag: for mp3 plugin [installed]
wavpack: for wavpack plugin
libpulse: for pulse plugin [installed]
opusfile: for opus plugin
twolame: for mp3 plugin [installed]

They are listed as optional dependencies
libmad twolame libid3tag

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

I missed these, sorry!

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

I'm all for you publishing this on crates.io, it works + it's great!

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

Awesome! I'll starting using that right away then :)

Thanks 👍

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

Okay, so I managed to make it work, but I feel like it's a very hacky method...

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

It takes into account the configuration of course, but ignores bars as I don't see a way we can make google_speech speak "bars" lol.

A good improvement is to understand which readout will be read and do some processing on the data to make the speech a lot more understable.

For example, instead of reading each digit individually in "5.10.46-1-lts" it should read it like any normal human being lol.

We can also make it so it says "Kernel is Linux 5.10.46-1-lts" instead of "Kernel" pause "Linux 5.10.46-1-lts"

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

Here we go #110.

It doesn't actually ignore bars, but I'll fix this right away, becuase apparently GoogleSpeech can speak in bars, and it goes something like this:

Filled circle, filled circle, filled circle, filled circle, filled circle...

Possibly the most hilarious thing I've ever come across developing macchina.

from macchina.

uttarayan21 avatar uttarayan21 commented on September 7, 2024

Filled circle, filled circle, filled circle, filled circle, filled circle...

Haha I never knew lol ! That is hilarious.

For example, instead of reading each digit individually in "5.10.46-1-lts" it should read it like any normal human being lol.

I think that is due to it thinking that the point's are decimal places, however no one has any control over it since the text is send to google servers ant that is what they reply with.

Also about espeak and other similar offline tts engines don't sound quite as natural so that is the reason I made this.

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

I think that is due to it thinking that the point's are decimal places, however no one has any control over it since the text is send to google servers ant that is what they reply with.

Ah okay!

Also about espeak and other similar offline tts engines don't sound quite as natural so that is the reason I made this.

I don't know too much about other TTS engines, and although some won't agree with the fact that we're using Google, if it's the most natural sounding engine, then I'm fine with having it in.

Plus, I think GoogleSpeech works offline? I tested it offline and it still reports data without an internet connection.

from macchina.

uttarayan21 avatar uttarayan21 commented on September 7, 2024

Plus, I think GoogleSpeech works offline? I tested it offline and it still reports data without an internet connection.

If it has same data as a previous call then it is cached. If you try with text that hasn't been said before it will fail if offline.

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

I thought about that, thanks for confirming it.

from macchina.

grtcdr avatar grtcdr commented on September 7, 2024

Closing this as #110 was merged a while ago.

from macchina.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.