grvydev / s.a.t.u.r.d.a.y Goto Github PK
View Code? Open in Web Editor NEWA toolbox for working with WebRTC, Audio and AI
License: MIT License
A toolbox for working with WebRTC, Audio and AI
License: MIT License
Are you building llm model ? Or interface for using the system ?
Hi there,
Nice project,
I tried to do something similar to this project some time ago and found faster-whisper to replace whisper.cpp.
https://github.com/guillaumekln/faster-whisper
It dramatically reduce STT duration and might be interesting for this project, I know it works in Python but I'm unsure about go.
Might want to take a look.
Have a nice day.
Interested in trying this out / tinkering with it on my system, but with the python install step I get:
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [8 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/private/var/folders/j1/ydw6655510z_5szr12ptrhn40000gn/T/pip-install-oi62efds/llvmlite_6ee7e6f42e224bccba1d315e951338df/setup.py", line 55, in <module>
_guard_py_ver()
File "/private/var/folders/j1/ydw6655510z_5szr12ptrhn40000gn/T/pip-install-oi62efds/llvmlite_6ee7e6f42e224bccba1d315e951338df/setup.py", line 52, in _guard_py_ver
raise RuntimeError(msg.format(cur_py, min_py, max_py))
RuntimeError: Cannot install on Python version 3.11.4; only versions >=3.7,<3.11 are supported.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Despite installing python 3.10.11 with pyenv
I'm still getting this error, think I need to somehow try and nuke 3.11.4 completely from my system (on mac)
It seems like the client side functions getMedia() and getDevices() crashes with an uncaught exception if you dont have a webcam. I was able to hack around it by setting video: false
in both of these functions.
If so what are the installation commands
Hi @GRVYDEV, I was curious about why you decided to send text from the TTT to the TTS in chunks, and hence audio chunks from the TTS to the browser client.
Why not get the entire text from TTT --> TTS and the entire audio from TTS --> browser client? Is it to account for long texts that might need to be synthesized by SATURDAY, hitting some bottleneck somewhere in the pipeline?
Or is it to minimize latency since I guess with chunked text and audio, we could have SATURDAY speaking as soon as we have text generated by the TTT and not have to wait for the entire piece of text to be generated.
Thanks!
We recently moved to using the slog package. Id like a custom handler implemented to make our output look closer to this
[2023-06-05 08:58:39.477] [INFO] [peer.go:243] => PeerLocal trickle peer_id=6FQaOl9g2S v=0
Converting transcription to bubble chat.
@GRVYDEV
To make it friendly and readable
So we can make it a group call
TranscribedText
Hi @GRVYDEV , awesome project! Thanks for the taking the time and effort for building this in public.
I wanted to test my webrtc streaming between the browser and the go client without the STT for now.
So, I wanted to pass the transcriptionStream as nil and I saw that there's a fixme about NewRTCConnection blowing up if a transcriptionStream is not passed in.
Could you elaborate why that happens and how I could fix it?
Right now, I get the log ...transcription relay is disabled
as expected but then get a websocket: bad handshake
error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.