zeroone2numeral2 / stt-bot Goto Github PK

1.0 1.0 1.0 235 KB

Python 99.52% Mako 0.48%

stt-bot's Introduction

Comandi per admin

- /r [sample rate]: rispondi ad un messaggio vocale per forzarne la trascrizione, possibile forzare un certo sample rate
- /parse: rispondi ad un messaggio vocale per ottenerne le info di base, non esegue la trascrizione
- /mediainfo: output mediainfo
- /superuser: permette di fare in modo che un certo utente possa aggiungere il bot ai gruppi. In chat privata va usato in risposta ad un messaggio inoltrato. Nei gruppi va usato in risposta ad un messaggio (ignora il mittente originale dei messaggi inoltrati)
- /config: ottieni il contenuto di config.behavior
- inoltro messaggio (non vocale) di un utente in chat privata: mostra la riga nel database

superusers/admins

aggiunta del bot nei gruppi: entrambi

trascrizione nei gruppi

I vocali nei gruppi non vengono trascritti se:

la chat è disabilitata
il vocale non è inoltrato, ed il mittente ha richiesto l'opt-out
il vocale è inoltrato, ed il mittente originale non ha nascosto il proprio account e ha richiesto l'opt-out

stt-bot's People

Contributors

Stargazers

Watchers

Forkers

renyhp

stt-bot's Issues

Chat model: "punctuation" property

Let each chat decide whether to include punctuation or not

Opus to flac conversion

pydub docs: https://github.com/jiaaro/pydub

about opus export (shouldn't be an issue because we would import only): https://github.com/jiaaro/pydub#ogg-exporting-and-default-codecs

voicybot conversion: https://github.com/backmeupplz/voicy/blob/d31f159ee18204587f1a1d73ed8c3d141503d3e3/helpers/urlToText.js#L76

voicybot ffmpeg command: https://github.com/backmeupplz/voicy/blob/master/helpers/flac.js#L18

Animate "..." while waiting for a result

Edit the "transcrining..." message while we are waiting for a result, and maybe show the elapsed time. Possible solution: spawn a new thread, pass it the message, then join the thread when the transcription is completed.

Or maybe, the VoiceMessageLocal should yield a "result" object when running long operations. The object should have a "done" property that singals when we are done

Plotting transcription durations

Export the table to a panda data set and then see here: https://realpython.com/pandas-plot-python/

New model: TranscriptionRequest

Log how much it takes (in seconds, float (eg. 6.7 seconds)) to transcribe audios.

What the model should track:

audio_duration
elapsed_seconds
success

"success" is optional, we could just add to a session the model instance only when the transcription is successful

This model is useful to give the user an estimated transcription time based on experience with audio with similar length

Test transcription quality after converting voice to FLAC

ffmpeg conversion: https://github.com/backmeupplz/voicy/blob/32c94d0e4b6114352fc64a31ae367e74ba652d42/helpers/urlToText.js#L87

encoding to pass to google: https://github.com/backmeupplz/voicy/blob/32c94d0e4b6114352fc64a31ae367e74ba652d42/engines/google.js#L231

Google's docs about which encoding to prefer: https://cloud.google.com/speech-to-text/docs/encoding#audio-encodings and https://cloud.google.com/speech-to-text/docs/best-practices

Google docs about optimizing audio files for recognition, with ffmpeg examples: https://cloud.google.com/solutions/media-entertainment/optimizing-audio-files-for-speech-to-text

ffmpeg to convert to linear16: https://medium.com/cod3/convert-speech-from-an-audio-file-to-text-using-google-speech-api-b951f4032a64

Python

ffmpeg commands from python: https://github.com/MarshalX/tgcalls/blob/7e6b5b11877fa39d6959ea429af3c6950e666768/examples/radio_as_smart_plugin.py#L63

catch "Wrong file_id or the file is temporarily unavailable" errors

We should make the bot retry (.from_message()) download (getFile) requests when this error is raised by the api (and sleep a few seconds when it happens). If it keeps happening, raise it