Coder Social home page Coder Social logo

Comments (8)

qJake avatar qJake commented on July 17, 2024 2

I ended up forking this yesterday and got it to work in Home Assistant, but I don't intend on maintaining the repository long-term.

The changes that would be necessary to support an "OpenAI TTS-Compatible" endpoint are:

  • Allow custom URL (or just custom hostname/IP and port number)
  • Don't make API Key required

If you want to try my fork out, it should work side-by-side with this repo since I changed the entity IDs, so load this as a custom repo in HACS and give it a try:

https://github.com/qJake/openai_tts

Unfortunately I don't have enough HA/Python development experience to know how to get these changes back into this main repo while supporting both OpenAI itself and a custom endpoint.

from openai_tts.

raldone01 avatar raldone01 commented on July 17, 2024 2

@qJake I took the liberty to use your code as a base to add custom endpoint support. I also opened a pull request.

@ther3zz I also added your change in my pr. (You should keep yours open too though.)

Do you remember why you changed mp3 to wav?

from openai_tts.

raldone01 avatar raldone01 commented on July 17, 2024 1

I think wav is most compatible. I will leave this feature out in order to keep the pr as minimal as possible.

from openai_tts.

qJake avatar qJake commented on July 17, 2024

+1 for this, there are a lot of open-source projects popping up that have "OpenAI-compatible" API endpoints for TTS - if you could let us override the host and port under some advanced settings, that would be awesome!

from openai_tts.

ther3zz avatar ther3zz commented on July 17, 2024

I ended up forking this yesterday and got it to work in Home Assistant, but I don't intend on maintaining the repository long-term.

The changes that would be necessary to support an "OpenAI TTS-Compatible" endpoint are:

  • Allow custom URL (or just custom hostname/IP and port number)
  • Don't make API Key required

If you want to try my fork out, it should work side-by-side with this repo since I changed the entity IDs, so load this as a custom repo in HACS and give it a try:

https://github.com/qJake/openai_tts

Unfortunately I don't have enough HA/Python development experience to know how to get these changes back into this main repo while supporting both OpenAI itself and a custom endpoint.

I would look into allowing for a custom model to be specified as well

from openai_tts.

qJake avatar qJake commented on July 17, 2024

I would look into allowing for a custom model to be specified as well

By model, do you mean speaker?

Yes, currently the AllTalk v2 Beta that I'm using supports the OpenAI API as a drop-in replacement, in which it has support for mapping an xTTS voice to one of the 6 supported OpenAI voices:

image

This suited my needs enough - I don't need more than 6 distinct voices.

However, yes, you are correct - generally speaking, it would be nice to have a customizable field for speaker and/or be able to change that on the fly as part of the assistant configuration rather than having to create multiple integrations for multiple speakers.

from openai_tts.

ther3zz avatar ther3zz commented on July 17, 2024

I would look into allowing for a custom model to be specified as well

By model, do you mean speaker?

Yes, currently the AllTalk v2 Beta that I'm using supports the OpenAI API as a drop-in replacement, in which it has support for mapping an xTTS voice to one of the 6 supported OpenAI voices:

image

This suited my needs enough - I don't need more than 6 distinct voices.

However, yes, you are correct - generally speaking, it would be nice to have a customizable field for speaker and/or be able to change that on the fly as part of the assistant configuration rather than having to create multiple integrations for multiple speakers.

I submitted a PR which sets the custom_value to true. That basically would allow more flexibility.

from openai_tts.

qJake avatar qJake commented on July 17, 2024

Do you remember why you changed mp3 to wav?

@raldone01 I found that most of the OpenAI-compatible open-source projects (like AllTalk) default to .wav so it was easier to change there. However, AllTalk does support audio transcoding (not sure what performance penalty this incurs, if any, though) -

image

On this front, we have three options I believe:

  1. Offer a dropdown, default to mp3, let the user choose the expected filetype that's coming from the custom API
  2. Detect the filetype automatically within the extension (probably difficult but maybe not?) or
  3. Leave it hardcoded to mp3 and let custom API users handle transcoding

from openai_tts.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.