laion-ai / natural_voice_assistant Goto Github PK

License: MIT License

Python 58.91% Jupyter Notebook 41.09%

natural_voice_assistant's Introduction

BUD-E: A conversational and empathic AI Voice Assistant

BUD-E (Buddy for Understanding and Digital Empathy) is an open-source AI voice assistant which aims for the following goals:

replies to user requests in real-time
uses natural voices, empathy & emotional intelligence
works with long-term context of previous conversations
handles multi-speaker conversations with interruptions, affirmations and thinking pauses
runs fully local, on consumer hardware.

This project is a collaboration between LAION, the ELLIS Institute Tübingen, Collabora and the Tübingen AI Center.

This demo shows an interaction with the current version of BUD-E on an NVIDIA RTX 4090. With this setup, the voice assistant answers with a latency of 300 to 500 milliseconds.

Quick Start

Clone this repository and follow the installation guide in the readme.
Start the voice assistant by running the main.py file in the repository root.
Wait until "Listening.." is printed to the console and start speaking.

Roadmap

Altough the conversations with the current version of BUD-E already feel quite natural, there are still a lot of components and features missing what we need to tackle on the way to a truly and naturally feeling voice assistant. The immediate open work packages we'd like to tackle are as follows:

Reducing Latency & minimizing systems requirements

Quantization. Implement more sophisticated quantization techniques to reduce VRAM requirements and reduce latency.
Fine-tuning streaming TTS. TTS systems normally consume full sentences to have enough context for responses. To enable high-quality low-latency streaming we give the TTS context from hidden layers of the LLM and then fine-tune the streaming model on a high-quality teacher (following https://arxiv.org/abs/2309.11210).
Fine-tuning streaming STT. Connect hidden layers from STT and LLM system and then fine-tune on voice tasks to maximize accuracy in low-latency configurations of STT model.
End-of-Speech detection. Train and implement a light-weight end-of-speech detection model.
Implement Speculative Decoding. Implement speculative decoding to increase inference speed in particular for the STT and LLM models.

Increasing Naturalness of Speech and Responses

Dataset of natural human dialogues. Build a dataset (e.g., Youtube, Mediathek, etc.) with recorded dialogues between two or more humans for fine-tuning BUD-E.
Reliable speaker-diarization. Develop a reliable speaker-diarization system that can separate speakers, including utterances and affirmations that might overlap between speakers.
Fine-tune on dialogues. Finetune STT -> LLM -> TTS pipeline on natural human dialogues to allow the model to respond similarly to humans, including interruptions and utterances.

Keeping track of conversations over days, months and years

Retrieval Augmented Generation (RAG). Implement RAG to extend knowledge of BUD-E, unlocking strong performance gains (cp. https://www.pinecone.io/blog/rag-study/).
Conversation Memory. Enable model to save information from previous conversations in vector database to keep track of previous conversations.

Enhancing functionality and ability of voice assistant

Tool use. Implement tool use into LLM and the framework, e.g., to allow the agent to perform internet searches

Enhancing multi-modal and emotional context understanding

Incorporate visual input. Use a light-weight but effective vision encoder (e.g., CLIP or a Captioning Model) to incorporate static image and/or video input.
Continuous vision-audio responses. Similar to the (not genuine) Gemini demo it would be great if BUD-E would naturally and continuously take into account audio and vision inputs and flexibly respond in a natural manner just like humans.
Evaluate user emotions. Capture webcam images from the user to determine the user’s emotional state and incorporate this in the response. This could be an extension of training on dialogues from video platforms, using training samples where the speaker’s face is well visible.

Building a UI, CI and easy packaging infrastructure

LLamaFile. Allow easy cross-platform installation and deployment through a single-file distribution mechanism like Mozilla’s LLamaFile.
Animated Avatar. Add a speaking and naturally articulating avatar similar to Meta’s Audio2Photoreal but using simpler avatars using 3DGS-Avatar [https://neuralbodies.github.io/3DGS-Avatar/].
User Interface. Capture the conversation in writing in a chat-based interface and ideally include ways to capture user feedback.
Minimize Dependencies. Minimize the amount of third-party dependencies.
Cross-Platform Support. Enable usage on Linux, MacOS and Windows.
Continuous Integration. Build continuous integration pipeline with cross-platform speed tests and standardized testing scenarios to track development progress.

Extending to multi-language and multi-speaker

Extend streaming STT to more languages. Extending to more languages, including low-resource ones, would be crucial.
Multi-speaker. The baseline currently expects only a single speaker, which should be extended towards multi-speaker environments and consistent re-identification of speakers.

Installation

The current version of BUD-E contains the following pretrained models:

Speech to Text Model: FastConformer Streaming STT by NVIDIA
Language Model: Microsoft Phi-2
Text to Speech Model: StyleTTS2

The model weights are downloaded and cached automatically when running the inference script for the first time.

To install BUD-E on your system follow these steps:

1) Setup Environment and Clone the Repo

We recommend to create a fresh conda environment with python 3.10.12.

conda create --name bud_e python==3.10.12
conda activate bud_e

Next, clone this repository. Make sure to pass the -recurse-submodules argument to clone the required submodules as well.

git clone --recurse-submodules https://github.com/LAION-AI/natural_voice_assistant.git

2) Install espeak-ng

Ubuntu:

sudo apt-get install festival espeak-ng mbrola

Windows:

Download and run the latest espeak-ng msi installer via https://github.com/espeak-ng/espeak-ng/releases
Add the path to the libespeak-ng.dll file to your conda environment:

conda env config vars set PHONEMIZER_ESPEAK_LIBRARY="C:\Program Files\eSpeak NG\libespeak-ng.dll"

Reactivate your conda environment

3) Install pytorch

Install torch and torchaudio using the configurator on https://pytorch.org/

4) Install Required Python Packages

Inside the repository run:

pip install -r requirements.txt

On Ubuntu, you might install portaudio which is required by pyaudio. If you encounter any errors with pyaudio, try to run:

sudo apt install portaudio19-dev

5) Start your AI conversation

Start BUD-E by running the main.py file inside the repository:

python main.py

Wait until all checkpoints are downloaded and all models are initialized. When "## Listening..." is prompted to the console, you can start speaking.
When starting the main.py a list of available audio devices is displayed in the terminal. By default the device with index 0 is used for recording. To select a specific audio device, you can use the --audio-device-idx argument and pass the device index you want to use.

Command-Line Arguments

Below are the available command-line arguments for starting the assistant:

Argument	Description	Default Value
`--audio-device-idx`	Select the audio device by index that should be used for recording. If no device idex is selected, the default audio device will be used.	`None`
`--audio-details`	Show details for the selcted audio device like the sample rate or number of audio channels.	`false`
`--tts-model`	Select the model that should be used for text to speech. You can choose between `StyleTTS2` and `WhisperSpeech`. Please note that `WhisperSpeech` relies on `torch.compile` which is not supported on windows. You can still use `WhisperSpeech` on Windows but the TTS inference will be very slow.	`StyleTTS2`

Troubleshooting

OSError: [Errno -9999] Unanticipated host error

This error could occur, if access to your audio device is denied. Please check your local settings and allow desktop apps to access the microphone.

OSError "invalid samplerate" or "invalid number of channels"

These are pyaudio related issues that occur if the selected audio device does not support the current sample rate or number of channels. Sample rate and channels are selected automatically regarding the current audio-device index that is used. If you encounter any problems related to pyaudio, use the --audio-device-idx argument and try a different device id. A list of all available audio-devices is printed when executing main.py.

Collaborating to Build the Future of Conversational AI

The development of BUD-E is an ongoing process that requires the collective effort of a diverse community. We invite open-source developers, researchers, and enthusiasts to join us in refining BUD-E's individual modules and contributing to its growth. Together, we can create an AI voice assistants that engage with us in natural, intuitive, and empathetic conversations.

If you're interested in contributing to this project, join our Discord community or reach out to us at [email protected].

natural_voice_assistant's People

Contributors

Stargazers

Watchers

natural_voice_assistant's Issues

ERROR: Could not build wheels for monotonic_align, which is required to install pyproject.toml-based projects

Environment:
Windows 10 Pro - 10.0.19045 Build 19045
Python - 3.10.12 (as requested)
Conda Version - 23.5.2

Steps for Reproduction:
Set up a new Conda environment using conda create -n env_name.
Activate the new environment using conda activate env_name.
Follow the installation instructions for the project up to the point of the error.
Run the command that caused the failure: [pip install -r requirements.txt]

Expected Behavior:
Package's should install without a hitch.

Actual Behavior:
Building wheel for monotonic_align (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for monotonic_align (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [3 lines of output]
core.c
C:\Users\Sytan\miniconda3\envs\bud_e\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
error: command 'C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\bin\HostX86\x64\cl.exe' failed with exit code 2
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for monotonic_align
Failed to build monotonic_align
ERROR: Could not build wheels for monotonic_align, which is required to install pyproject.toml-based projects

Additional Notes:
My system uses a Dual GPU setup, though this has rarely caused issue for me in the past. I use a 3090 as my main GPU, with a 3060ti for secondary compute and LLM memory pooling

I have decided to post this error, as I can likely find my way around it, but many users may be deterred by this error. The effects of this error have yet to be fully tested.

Adding RAG

Following the conversation in the Discord channel, I'm planning to begin working on the Retriever-Augmented Generation (RAG). My approach will involve using FAISS for the vector database and for extracting text from PDF files, I'll be using PyPDF.

Please let me know if these new dependencies are ok.

I am doing some tests to use image to image description models to include those in the text from PyPDF.

For the embeddings I am planing to use mixedbread-ai/mxbai-embed-large-v1, based on the model size and MTEB performance. Please let me know if there is another model that is preferred.

add LICENSE

it's the thing that makes it actually open source and not just having code listed in a repo

StyleTTS2 for german language

A version of StyleTTS2 generating german language with good emphasis would be a giant leap forward in my opinion.
All the existing open source TTS systems for german are sounding somewhat robotic or indifferent. (no comparison to the english versions - as far as i can judge as a non native english speaking person).

"import sounddevice as sd" halts python

main.py gets stuck on Windows 11 "import sounddevice as sd" and simply halts. Win 11 Pro 22635.3140

Can you make ui for this or can I run this in colab?

[Feature Request] for Home Assistant Connectivity with BUD-E

Objective: Integrate BUD-E with Home Assistant to enhance smart home automation through advanced AI-driven voice interaction.

Description: This feature request proposes the development of an add-on or plugin that enables BUD-E, an open AI voice assistant, to control and interact with Home Assistant's ecosystem. The integration aims to leverage BUD-E's conversational quality, naturalness, and empathy to provide users with a more intuitive and personalized smart home experience.

Key Features:

Voice Control: Allow users to use natural language voice commands through BUD-E to manage Home Assistant entities like lights, thermostats, and security systems.
Contextual Awareness: Utilize BUD-E's ability to understand complex contexts and previous interactions to offer personalized automation suggestions and actions.
Privacy and Local Processing: Ensure that the integration prioritizes user privacy by processing voice commands locally, in alignment with both BUD-E's and Home Assistant's commitment to data privacy.
Easy Setup and Configuration: Design the integration to be user-friendly, with simple setup procedures for connecting BUD-E with Home Assistant, including a clear interface for linking devices and entities.
Benefits:

Enhanced user experience with more natural and intuitive voice interactions.
Increased personalization and efficiency in smart home automation.
Strengthened privacy through local processing of voice commands.
Request: We invite the development community and Home Assistant contributors to collaborate on this project to bring seamless AI-driven voice control to the Home Assistant platform, enriching the smart home experience for users worldwide.

FileNotFoundError: [Errno 2] No such file or directory: 'latencies/times_stt'

Listening...

what
is
the
cap
ital
of
france

Total Latency: 0.939

Process Process-2:
Traceback (most recent call last):
  File "C:\webui\installer_files\conda\envs\bud_e\lib\multiprocessing\process.py", line 314, in _bootstrap
    self.run()
  File "C:\webui\installer_files\conda\envs\bud_e\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "d:\repo\natural_voice_assistant\main.py", line 228, in main_loop
    text, wav, interrupt = model(chunk_audio, chunk_lengths)
  File "C:\webui\installer_files\conda\envs\bud_e\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\webui\installer_files\conda\envs\bud_e\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "d:\repo\natural_voice_assistant\models_voice_assistant\stt_llm_tts_model.py", line 699, in forward
    end, response, wav = self.handle_stop_conditions()
  File "d:\repo\natural_voice_assistant\models_voice_assistant\stt_llm_tts_model.py", line 562, in handle_stop_conditions
    with open('latencies/times_stt', 'w') as fout:
FileNotFoundError: [Errno 2] No such file or directory: 'latencies/times_stt'

If I make it manually it works fine, perhaps consider a placeholder in the git? :)

RuntimeError: espeak not installed on your system

Environment:
Windows 10 Pro - 10.0.19045 Build 19045
Python - 3.10.12 (as requested)
Conda Version - 23.5.2

Steps for Reproduction:
After full installation as directed, I tried to run [python main.py]
At this point, I get an error stating that eSpeak is not installed

Expected Behavior:
BUD-E should boot and function for conversation

Actual Behavior:
Traceback (most recent call last):
File "E:\BUD-E\natural_voice_assistant\main.py", line 175, in
main()
File "E:\BUD-E\natural_voice_assistant\main.py", line 169, in main
model = STT_LLM_TTS(device=device)
File "E:\BUD-E\natural_voice_assistant\stt_llm_tts_model.py", line 371, in init
self.tts = TTS(device=device)
File "E:\BUD-E\natural_voice_assistant\stt_llm_tts_model.py", line 333, in init
self.tts_model = StyleTTS2Model(device=device)
File "E:\BUD-E\natural_voice_assistant\style_tts2_model.py", line 38, in init
self.global_phonemizer = phonemizer.backend.EspeakBackend(language='en-us', preserve_punctuation=True, with_stress=True)
File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\phonemizer\backend\espeak\espeak.py", line 45, in init
super().init(
File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\phonemizer\backend\espeak\base.py", line 39, in init
super().init(
File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\phonemizer\backend\base.py", line 77, in init
raise RuntimeError( # pragma: nocover
RuntimeError: espeak not installed on your system

Additional Notes:
The command provided to run is as follows: conda env config vars set PHONEMIZER_ESPEAK_LIBRARY="C:\Program Files\eSpeak NG\libespeak-ng.dll", however, I found that SpeakE was installed at dir: C:\Program Files (x86)\eSpeak NG\libespeak-ng.dll. After fixing this and pointing to the right file dir, I am still getting the same issue stating that I do not have SpeakE installed. I made sure to use the command as stated, and additionally restarted my conda env after pointing it.

pyaudio device selection improvements... and sounddevice/alsa conflicts

Okay, before I submit PRs I think I should get some feedback. I included a diff below for your reference.

Due to the threading used by pyaudio, alsa can get a conflict with the sound device being accessed multiple times. (Fix is below in the patch).
The primary error one would see is:
OSError: [Errno -9993] Illegal combination of I/O devices

A more-full error would be something like:

File "/home/j/src/ai/bud-e/./main.py", line 50, in record
streamIn = audio.open(format=pyaudio.paFloat32, channels=1,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/.../python3.11/site-packages/pyaudio/__init__.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/.../python3.11/site-packages/pyaudio/__init__.py", line 441, in __init__
self._stream = pa.open(**arguments)
^^^^^^^^^^^^^^^^^^^^
OSError: [Errno -9993] Illegal combination of I/O devices

Online posts led me to find it was related to the threading, and sounddevice being imported globally (in main.py). Placing it in play_audio() addressed it (it's the only place sd is used in main.py).

...Now, on to the pyaudio issue.

We currently just hardcode device index 0. I hacked in a global variable containing desired device name priorities (which is why I am not putting it in a PR), and a function to filter/select devices based on those priorities, falling back on error to index 0 like before (You'll see the sounddevice import change as well as my device selection additions):

diff --git a/main.py b/main.py
index 908ba82..c5c42f4 100644
--- a/main.py
+++ b/main.py
@@ -3,11 +3,35 @@ import time
 import torch
 import pyaudio
 import multiprocessing
-import sounddevice as sd
+import sys
 from preprocessor import Preprocessor
 from streaming_buffer import StreamBuffer
 from stt_llm_tts_model import STT_LLM_TTS
 
+# Define your pyaudio device preferences here
+dev_pref_names=[ 'pulse', 'pipewire', 'default', ]
+
+def list_pyaudio_devices(audio):
+    print("  Available pyaudio devices:", file=sys.stderr)
+    for i in range(audio.get_device_count()):
+        dev = audio.get_device_info_by_index(i)
+        print((i,dev['name'],dev['maxInputChannels']),
+              file=sys.stderr)
+
+def get_pyaudio_input_idx(audio):
+    list_pyaudio_devices(audio)
+    for pref_name in dev_pref_names:
+        for i in range(audio.get_device_count()):
+            dev = audio.get_device_info_by_index(i)
+            # print(f"[{i}] pref({pref_name}) == dev({dev['name']})")
+            # print(f" {type(pref_name)}) == {type(dev['name'])}")
+            if pref_name == dev['name'].lower() and dev['maxInputChannels'] > 0:
+                print(f"Selecting pyaudio device {dev['name']}",
+                      file=sys.stderr)
+                return i
+    print("Couldn't find a pyaudio device from our dev_pref_names (f{dev_pref_names}). Picking idx 0", file=sys.stderr)
+    list_pyaudio_devices(audio)
+    return 0
 
 def record(audio_buffer, start_recording):
     """Record an audio stream from the microphone in a separate process  
@@ -20,8 +44,10 @@ def record(audio_buffer, start_recording):
 
     # Open audio input stream
     audio = pyaudio.PyAudio()
+    dev_idx = get_pyaudio_input_idx(audio)
+    print(f"Using pyaudio device index {dev_idx}", file=sys.stderr)
     streamIn = audio.open(format=pyaudio.paFloat32, channels=1,
-                            rate=RATE, input=True, input_device_index=0,
+                            rate=RATE, input=True, input_device_index=dev_idx,
                             frames_per_buffer=CHUNK)
     
     while(True):
@@ -45,6 +71,7 @@ def play_audio(audio_output_buffer):
         Args:
             audio_output_buffer: multiprocessing-queue to receive audio data
     """
+    import sounddevice as sd
     fs = 24000
     while(True):
         # get next audio data

OSError: [Errno -9999] Unanticipated host error pyaudio if you deny access to microphone in windows settings

The system will report OSError: [Errno -9999] Unanticipated host error pyaudio if you deny access to microphone in windows settings

And close out

`Currently input device with id 1 is used for recording. To change the audio device, please use the --audio-device-idx parameter.

Traceback (most recent call last):
File "d:\repo\natural_voice_assistant\main.py", line 306, in
main()
File "d:\repo\natural_voice_assistant\main.py", line 296, in main
record(audio, sample_rate, audio_channels, audio_input_buffer, start_recording, args.audio_device_idx, args.audio_details)
File "d:\repo\natural_voice_assistant\main.py", line 115, in record
streamIn = audio.open(format=pyaudio.paFloat32, channels=channels,
File "C:\webui\installer_files\conda\envs\bud_e\lib\site-packages\pyaudio_init_.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "C:\webui\installer_files\conda\envs\bud_e\lib\site-packages\pyaudio_init_.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9999] Unanticipated host error`

Not really a bug but perhaps add it to the install notes.

Gibberish (random?) output after I say only "hello".

The assistant begins saying:

\`\`\`def count_vowels(word):    vowels = ['a', 'e', 'i', 'o', 'u']    count = 0    for letter in word
## Total Latency:  0.545
:        if letter.

... and she continues. :)

I thought it was the result of local echo, so I enabled module-echo-cancel (and tested it and it completely inhibits feedback). Unfortunately it has no effect on the assistant outputting continual gibberish.) :)

ModleNotFoundError utils && models in models_voice_assistant/TTS/style_tts2_model.py

Environment:
Ubuntu - 22.04.3 LTS (Jammy Jellyfish)
Python 3.11.2
Pipenv - 2022.11.30

Steps for Reproduction:
After full installation as directed, I tried to run [python main.py]
At this point, I get:

ModuleNotFoundError: No module named 'utils'
(* ModuleNotFoundError: No module named 'models')

Expected Behavior:
BUD-E should boot and function for conversation

Actual Behavior:

Traceback (most recent call last):
  File "/<dir>/natural_voice_assistant/main.py", line 13, in <module>
    from models_voice_assistant.stt_llm_tts_model import STT_LLM_TTS
  File "/<dir>/natural_voice_assistant/models_voice_assistant/stt_llm_tts_model.py", line 6, in <module>
    from models_voice_assistant.TTS.style_tts2_model import StyleTTS2Model
  File "/<dir>/natural_voice_assistant/models_voice_assistant/TTS/style_tts2_model.py", line 10, in <module>
    from utils import *
ModuleNotFoundError: No module named 'utils'

Additional Notes:

At first sight, I tried to fix the "utils" import by installing the python3-utils packages. And I got the same error for the models import in the same file. I realized that the utils package isn't the one I installed, so I removed it.

error while installing - Pyaudio

Hello,

Installing requirements, I get this: "ERROR: Could not build wheels for pyaudio, which is required to install pyproject.toml-based projects"
Running Ubuntu JellyFish with 32 G RAM and NVIDIA GTX 1650

Thanks!

Windows install requires the additional installation of punkt

Environment:
Windows 10 Pro - 10.0.19045 Build 19045
Python - 3.10.12 (as requested)
Conda Version - 23.5.2

Steps for Reproduction:
After full installation as directed, I tried to run [python main.py]
At this point, I get LookupError: Resource punkt not found.

Expected Behavior:
BUD-E should boot and function for conversation

Actual Behavior:

  File "E:\BUD-E\natural_voice_assistant\main.py", line 175, in <module>
    main()
  File "E:\BUD-E\natural_voice_assistant\main.py", line 169, in main
    model = STT_LLM_TTS(device=device)
  File "E:\BUD-E\natural_voice_assistant\stt_llm_tts_model.py", line 371, in __init__
    self.tts = TTS(device=device)
  File "E:\BUD-E\natural_voice_assistant\stt_llm_tts_model.py", line 338, in __init__
    self.forward("warming up!")
  File "E:\BUD-E\natural_voice_assistant\stt_llm_tts_model.py", line 349, in forward
    wav = self.tts_model(text)
  File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\BUD-E\natural_voice_assistant\style_tts2_model.py", line 115, in forward
    ps = word_tokenize(ps[0])
  File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\nltk\tokenize\__init__.py", line 129, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\nltk\tokenize\__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
  File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\nltk\data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\nltk\data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "C:\Users\Sytan\miniconda3\envs\bud_e\lib\site-packages\nltk\data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError:
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/english.pickle

  Searched in:
    - 'C:\\Users\\Sytan/nltk_data'
    - 'C:\\Users\\Sytan\\miniconda3\\envs\\bud_e\\nltk_data'
    - 'C:\\Users\\Sytan\\miniconda3\\envs\\bud_e\\share\\nltk_data'
    - 'C:\\Users\\Sytan\\miniconda3\\envs\\bud_e\\lib\\nltk_data'
    - 'C:\\Users\\Sytan\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''
**********************************************************************

Additional Notes:

I did as the error told me and created my own .py file called installpunkt.py. After running it using the 2 lines suggested by the error, it installed punkt and allowed BUD-E to boot properly and begin listening/speaking

This issue could also likely be solved by running

python.exe -c 'import nltk; nltk.download("punkt")'

Raised Error Related Models

python main.py
Traceback (most recent call last):
File "/home/y3/Desktop/try/p/bin/natural_voice_assistant/main.py", line 13, in
from models_voice_assistant.stt_llm_tts_model import STT_LLM_TTS
File "/home/y3/Desktop/try/p/bin/natural_voice_assistant/models_voice_assistant/stt_llm_tts_model.py", line 6, in
from models_voice_assistant.TTS.style_tts2_model import StyleTTS2Model
File "/home/y3/Desktop/try/p/bin/natural_voice_assistant/models_voice_assistant/TTS/style_tts2_model.py", line 11, in
from models import *
ModuleNotFoundError: No module named 'models'

Consider Integrating Vocode

Hello Team,

I recently came across your project and was impressed by its capabilities. As a developer working on the Vocode project, I wanted to suggest integrating Vocode into your system. Vocode is an open-source tool designed for building voice-based LLM agents, optimized for real-time streaming conversations.

I believe Vocode could be a valuable addition to your project, enhancing both performance and scalability. Please feel free to check out our documentation and explore how Vocode might fit into your workflow: Vocode Local Conversation.

Looking forward to any thoughts or questions you might have!

Best regards,

laion-ai / natural_voice_assistant Goto Github PK

natural_voice_assistant's Introduction

BUD-E: A conversational and empathic AI Voice Assistant

Quick Start

Roadmap

Reducing Latency & minimizing systems requirements

Increasing Naturalness of Speech and Responses

Keeping track of conversations over days, months and years

Enhancing functionality and ability of voice assistant

Enhancing multi-modal and emotional context understanding

Building a UI, CI and easy packaging infrastructure

Extending to multi-language and multi-speaker

Installation

1) Setup Environment and Clone the Repo

2) Install espeak-ng

Ubuntu:

Windows:

3) Install pytorch

4) Install Required Python Packages

5) Start your AI conversation

Command-Line Arguments

Troubleshooting

OSError: [Errno -9999] Unanticipated host error

OSError "invalid samplerate" or "invalid number of channels"

Collaborating to Build the Future of Conversational AI

natural_voice_assistant's People

Contributors

Stargazers

Watchers

Forkers

natural_voice_assistant's Issues

Listening...

Total Latency: 0.939

Recommend Projects

Recommend Topics

Recommend Org