Coder Social home page Coder Social logo

Comments (21)

LostRuins avatar LostRuins commented on August 28, 2024 2

Thanks both for the great support.

from alltalk_tts.

LostRuins avatar LostRuins commented on August 28, 2024 1

Hi, KoboldCpp dev here. The reason it's not documented is because it's supposed to be a directly-compatible drop in for the XTTS API server https://github.com/daswer123/xtts-api-server/tree/main

There are only 2 endpoints in use. The first is calling GET /speakers_list and saving the list of voice_labels, which is a simple array of strings.

Then, the POST TTS payload Kobold uses is really basic. It's a simple JSON object with 3 fields, sent to /tts_to_audio/

{
	"text": "speech prompt",
	"speaker_wav": "voice_label",
	"language": "EN"
};

I have not yet had a chance to explore your project. May I know why this default payload will not work?

Edit: From the various comments I've seen, is this supposed to be XTTS compatible?

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024 1

Basically because when I initially built that API portion it was specifically integrating with HTML web forms. Keeping it that way has kept things simple as its now used in various large projects and changing it would screw up quite a lot other peoples things.

I can have a shot at building a colab if you really want?

There is however an easy setup routine video and Instructions

It does need 6-8GB disk space (as I recall) but it will setup a custom standalone Python environment tun run AllTalk. There's not really anything you have to do and when you are finished, you can simply delete the entire cloned folder.

Would that work or do you want me to attempt a colab?

Thanks

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024 1

Hi @illtellyoulater

Just to confirm what I said on the PR, Ive given your update to Kobold a test locally and it seems to work great! Ive confirmed to @LostRuins, Ive also made reference and thanks to both Lostruins and yourself in the changelog for AllTalk #25 (happy to put you as a proper @ if thats ok with you).

All the best and thanks for working on this. Apologies if I was slower to respond or a bit distracted, I have had things going on IRL that, well, Im not going to discuss on the internet.

Thanks

from alltalk_tts.

illtellyoulater avatar illtellyoulater commented on August 28, 2024 1

@erew123 I hope everything is getting better on your side, and yes, I'd be honored to be added to the AllTalk changelog with a proper @ 👍 :) Thank you both!

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024 1

@illtellyoulater You're listed in the changelog and on the release page. https://github.com/erew123/alltalk_tts/releases/tag/1.9 thanks again.

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024

Hi @rktvr

I had a look through the link you sent and tried to hunt through their commits/code. I cant actually find a direct reference to when they added code only a minor update to that code (2/3 days ago). Long and short is its something they have done and I've not either been involved or had any contact from them. As far as I can tell its not related to AllTalk, but its something they would be able to implement/change, or Id be happy to work with them on it.

Maybe you want to try contacting them and test the water, see if they have any interest and we can go from there.

from alltalk_tts.

rktvr avatar rktvr commented on August 28, 2024

yeah i tried looking too and found pretty much nothing either about it. i asked on their discord, but doesn't seem like much will happen there as they recommend to use a different xtts server. shame because i prefer this one far more.

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024

Yeah, I don't see anything to do with developing add ins or any such thing, it looks like everything is directly within the codebase of Kobold, rather than extensions. hey ho!

from alltalk_tts.

tau0-deltav avatar tau0-deltav commented on August 28, 2024

Kobold AI Lite is the .embd thing that's being developed here:

https://github.com/LostRuins/lite.koboldai.net

from alltalk_tts.

illtellyoulater avatar illtellyoulater commented on August 28, 2024

As noted in #103 (comment), KoboldCpp's recent XTTS support is leveraging https://github.com/daswer123/xtts-api-server, which is also the XTTS endpoint used by SillyTavern.

However from my experience so far, AllTalk seems to be a better XTTS implementation (more robust and better performing), so it'd be cool if it could also supported KoboldCpp (and SillyTavern).

@erew123 can we reopen this issue and discuss this further?

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024

Hi @illtellyoulater

As mentioned on the other ticket SillyTavern is already supported (as long as you have an up to date SillyTavern).

Re Kobold currently their implementation of XTTS is within their main Kobold file https://github.com/LostRuins/lite.koboldai.net/blob/main/index.html (if you search for XTTS you will see lots of references spread through out the document).

Exactly how all those parts are interacting and working together in that index.html document of theirs, Im not sure, as (best I can tell) there is no developer guide.

So is it possible for me to reverse engineer their code and make up an implementation, well theoretically yes, however, because its actually deep within the main part of the codebase of the Kobold project, if I alter it, and get it working and they dont merge that change into their code base, then the only way I can keep it integrated is to maintain a local version of Kobold and update their code, every time they make an update, which isn't exactly ideal as it could result in a lot of work for me, depending on how often they change their code and even more so if they re-arrange things.

Just to clarify on that, with SillyTavern they provide a development guide and you build an extension file https://github.com/SillyTavern/SillyTavern/tree/release/public/scripts/extensions/tts and that has no impact on the main code of SillyTavern. So when I wrote the extension for that, its easy for them to review and if necessary drop an extension/debug etc as they know its a separate chunk of code.

So with Kobold's nod in the direction that they aren't interested in AllTalk being added to the code, a lack of development documentation/support, plus the time it would take me to reverse engineer and implement in their index.html code, I've naturally been reluctant to do anything in this direction.

You're welcome to ask them again if they are interested, id be willing to attempt it (as long as I know they are willing to help get it working/accept/review any code).

Hope that explains things a bit better.

from alltalk_tts.

illtellyoulater avatar illtellyoulater commented on August 28, 2024

After fixing the inconsistency described in #103, I went through KoboldCpp's klite.embd code and ended up implementing myself he the remaining code for supporting TTS generation with Alltalk. It wasn't' about reverse engineering... just about adding lacking code :)

My PR for that is at LostRuins/koboldcpp#719

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024

Hi @LostRuins

Thanks for messaging! Danil (who wrote XTTS server) and myself developed our own servers around the same time based on the Coqui AI Text to speech model. There are no default API settings/suite, so we have developed our own API suites independent of one another and there is therefore no cross compatibility as there was/is no API standard provided by Coqui.

It would be similar to saying why doesn't OpenAI GPT use exactly the same API suite as Google Gemini (or vice versa), both companies have done their own thing and went off in their own directions.

So within the AllTalk API suite, https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-api-suite-and-json-curl I have provided quite a variety of other settings and features such as a Narrator e.g. https://vocaroo.com/18nrv7FR6wuA

Some of these features are provided through different API calls, however, the main API call for actually generating TTS is quite open and does require a variety of things be sent over https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-example-command-lines-standard-generation

This is because quite a few other projects use AllTalk for a variety of other things and so the API grew somewhat. However a breakdown of an example CURL/API call is as follows:

curl -X POST "http://127.0.0.1:7851/api/tts-generate" -d "text_input=*This is text spoken by the narrator* \"This is text spoken by the character\". This is text not inside quotes." -d "text_filtering=standard" -d "character_voice_gen=female_01.wav" -d "narrator_enabled=true" -d "narrator_voice_gen=male_01.wav" -d "text_not_inside=character" -d "language=en" -d "output_file_name=myoutputfile" -d "output_file_timestamp=true" -d "autoplay=true" -d "autoplay_volume=0.8"

This call gives you multiple ways to filter text to clean it up (or not). The voice for the main character. Enabling or disabling the narrator on this call and the narrator voice to use. When using the narrator, how to handle text that cannot be identified as either narrator or character. The language to use. Output file name you wish to use. Output timestamp (if you dont want to keep over-writing the same wav file). Autoplay and autoplay volume are typically used by people who want to send remote requests to their server and have the server play the TTS through that machines speakers (these would typically be false if you are going to pull back the wav file and play it within a web browser, as the JSON return will list 3x ways to pull back the wav file that was generated). All these base parameters are required to call on the API of AllTalk.

{"status":"generate-success","output_file_path":"C:\\text-generation-webui\\extensions\\alltalk_tts\\outputs\\myoutputfile_1704141936.wav","output_file_url":"http://127.0.0.1:7851/audio/myoutputfile_1704141936.wav","output_cache_url":"http://127.0.0.1:7851/audiocache/myoutputfile_1704141936.wav"}

Further documentation is all on the front page of the Github https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-request-parameters

There is a separate API call for streaming audio (due to the way streaming works) https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-tts-generation-endpoint-streaming-generation

Then built into the backend of AllTalk is a management page for setting the base start-up settings https://raw.githubusercontent.com/erew123/screenshots/main/settingsanddocs.jpg

I see that @illtellyoulater has sent you a PR, I've only had time to briefly look through it and Ive not assessed how it interplays with the rest of your code. The only thing I note is that autoplay is set to true which means AllTalk is playing the audio and therefore I assume Kobold isnt playing the resulting wav files that are generated. This of course leaves the control of playback out the hands of Kobold. Im not sure how you handle it?

If you just wanted Streaming audio, you can of course just use the streaming audio API, however, this doesnt support the Narrator function.

Happy to chip in/help/answer questions. Apologies for a long answer.

Thanksl

erew123

from alltalk_tts.

LostRuins avatar LostRuins commented on August 28, 2024

Do you have a version of your server that can be run via colab?

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024

@LostRuins I don't currently have one. Is that a deal-breaker for the Kobold community? or do you mean just to simplify testing etc?

from alltalk_tts.

LostRuins avatar LostRuins commented on August 28, 2024

Not really a deal breaker - but yeah it would be helpful to have an easy to use setup that I can use to speed up testing.

Btw I am a little puzzled why you'd pick the form-data approach as opposed to a JSON payload.

from alltalk_tts.

LostRuins avatar LostRuins commented on August 28, 2024

I think if you had a easy to use colab, it would be very helpful not just for me, but for others who might want to explore your API

Here is a reference from XTTS: https://colab.research.google.com/drive/1b-X3q5miwYLVMuiH_T73odMO8cbtICEY?usp=sharing starts with a few clicks and the API is ready to use, loaded with some sampler voices

KoboldCpp also has a one click colab - https://colab.research.google.com/github/LostRuins/koboldcpp/blob/concedo/colab.ipynb

from alltalk_tts.

illtellyoulater avatar illtellyoulater commented on August 28, 2024

Nice to see you guys are collaborating on this!

As @erew123 noted, in the PR I had autoplay set to true so the audio was actually played by the AllTalk endpoint.

However I just tested setting autoplay=false, and the audio is correctly played by the browser with no other changes required.

In order to reflect we now have a minimal but working starting point, I will edit the PR to set 'autoplay=false` so that you guys will also be able to quickly test all the moving parts.

from alltalk_tts.

erew123 avatar erew123 commented on August 28, 2024

Hi @LostRuins No probs. Give me a couple of days on this as Im caught up elsewhere currently. I did have a quick shot at it, however there is a dependency resolver issue to sort out. I will get back to you soon.

Thanks

from alltalk_tts.

LostRuins avatar LostRuins commented on August 28, 2024

Cool. I did a tentative integration (without testing) so you can use this if you wanna test the colab. See LostRuins/koboldcpp#719 (comment)

from alltalk_tts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.