grvydev / s.a.t.u.r.d.a.y Goto Github PK

View Code? Open in Web Editor NEW

656.0 11.0 34.0 5.99 MB

A toolbox for working with WebRTC, Audio and AI

License: MIT License

Go 78.07% JavaScript 11.28% HTML 2.48% Dockerfile 2.92% Shell 0.07% Makefile 2.60% Python 2.58%

ai audio golang opus opus-codec webrtc whisper-cpp

s.a.t.u.r.d.a.y's People

Contributors

Stargazers

Watchers

s.a.t.u.r.d.a.y's Issues

LLM part of this project ?

Are you building llm model ? Or interface for using the system ?

(Idea) Replace whisper.cpp by faster-whisper

Hi there,

Nice project,
I tried to do something similar to this project some time ago and found faster-whisper to replace whisper.cpp.
https://github.com/guillaumekln/faster-whisper

It dramatically reduce STT duration and might be interesting for this project, I know it works in Python but I'm unsure about go.

Might want to take a look.
Have a nice day.

Error with Python Install

Interested in trying this out / tinkering with it on my system, but with the python install step I get:

  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/j1/ydw6655510z_5szr12ptrhn40000gn/T/pip-install-oi62efds/llvmlite_6ee7e6f42e224bccba1d315e951338df/setup.py", line 55, in <module>
          _guard_py_ver()
        File "/private/var/folders/j1/ydw6655510z_5szr12ptrhn40000gn/T/pip-install-oi62efds/llvmlite_6ee7e6f42e224bccba1d315e951338df/setup.py", line 52, in _guard_py_ver
          raise RuntimeError(msg.format(cur_py, min_py, max_py))
      RuntimeError: Cannot install on Python version 3.11.4; only versions >=3.7,<3.11 are supported.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Despite installing python 3.10.11 with pyenv I'm still getting this error, think I need to somehow try and nuke 3.11.4 completely from my system (on mac)

Client crashes if there is no video device

It seems like the client side functions getMedia() and getDevices() crashes with an uncaught exception if you dont have a webcam. I was able to hack around it by setting video: false in both of these functions.

Can I use this in Linux?

If so what are the installation commands

Why chunk text and audio?

Hi @GRVYDEV, I was curious about why you decided to send text from the TTT to the TTS in chunks, and hence audio chunks from the TTS to the browser client.
Why not get the entire text from TTT --> TTS and the entire audio from TTS --> browser client? Is it to account for long texts that might need to be synthesized by SATURDAY, hitting some bottleneck somewhere in the pipeline?

Or is it to minimize latency since I guess with chunked text and audio, we could have SATURDAY speaking as soon as we have text generated by the TTT and not have to wait for the entire piece of text to be generated.

Thanks!

feat: Make logging better

We recently moved to using the slog package. Id like a custom handler implemented to make our output look closer to this
[2023-06-05 08:58:39.477] [INFO] [peer.go:243] => PeerLocal trickle peer_id=6FQaOl9g2S v=0

[feature] : add bubble chat

Summary

Converting transcription to bubble chat.
@GRVYDEV

Motivation

To make it friendly and readable
So we can make it a group call

Describe alternatives you've considered

Modified response data channel
Remove TranscribedText
Storage all responses on the web
Return text generated in text to text
=> Make sure not too much data send through data-channel if we have a long call.

Additional context

I have a demo like that

NewRTCConnection without transcriptionStream

Hi @GRVYDEV , awesome project! Thanks for the taking the time and effort for building this in public.

I wanted to test my webrtc streaming between the browser and the go client without the STT for now.
So, I wanted to pass the transcriptionStream as nil and I saw that there's a fixme about NewRTCConnection blowing up if a transcriptionStream is not passed in.

Could you elaborate why that happens and how I could fix it?

Right now, I get the log ...transcription relay is disabled as expected but then get a websocket: bad handshake error.