olney1 / chatgpt-openai-smart-speaker Goto Github PK

This AI Smart Speaker uses speech recognition and text-to-speech to enable voice-driven conversations and vision capabilities with OpenAI and Agents. The user speaks a prompt into the microphone, and the program sends the prompt to OpenAI to generate a response. The response is then converted to an audio file and played back to the user.

License: MIT License

Python 100.00%

chatgpt openai smarthome smartspeaker ai artificial-intelligence gpt-4 speech-recognition speech-to-text text-to-speech agents langchain langsmith vision

chatgpt-openai-smart-speaker's Introduction

ChatGPT Smart Speaker (speech recognition and text-to-speech using OpenAI and Google Speech Recognition)

Video Demos

Video Demo using activation word "Jeffers"

Video Demo with Vision

Equipment List:

- Raspberry Pi 4b 4GB

- VMini External USB Stereo Speaker

- VReSpeaker 4-Mic Array

- ANSMANN 10,000mAh Type-C 20W PD Power Bank

Running on your PC/Mac (use the chat.py or test.py script)

The chat.py and test.py scripts run directly on your PC/Mac. They both allow you to use speech recognition to input a prompt, send the prompt to OpenAI to generate a response, and then use gTTS to convert the response to an audio file and play the audio file on your Mac/PC. Your PC/Mac must have a working default microphone and speakers for this script to work. Please note that these scripts were designed on a Mac, so additional dependencies may be required on Windows and Linux. The difference between them is that chat.py is faster and always on and test.py acts like a standard smart speaker - only working once it hears the activation command (currently set to 'Jeffers').

Running on Raspberry Pi (use the pi.py script)

The pi.py script is a new and more advanced custom version of the smart_speaker.py script and is the most advanced script similar to a real smart speaker. The purpose of this script is to offload the wake up word to a custom model build via PicoVoice (https://console.picovoice.ai/). This improves efficiency and long term usage reliability. This script will be the main script for development moving forward due to greater reliability and more advanced features to be added regularly.

Prerequisites - chat.py

You need to have a valid OpenAI API key. You can sign up for a free API key at https://platform.openai.com.
You'll need to be running Python version 3.7.3 or higher. I am using 3.11.4 on a Mac and 3.7.3 on Raspberry Pi.
Run brew install portaudio after installing HomeBrew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
You need to install the following packages: openai, gTTS, pyaudio, SpeechRecognition, playsound, python-dotenv and pyobjc if you are on a Mac. You can install these packages using pip or use pipenv if you wish to contain a virtual environment.
Firstly, update your tools: pip install --upgrade pip setuptools then pip install openai pyaudio SpeechRecognition gTTS playsound python-dotenv apa102-pi gpiozero pyobjc

Prerequisites - pi.py

To run pi.py you will need a Raspberry Pi 4b (I'm using the 4GB model but 2GB should be enough), ReSpeaker 4-Mic Array for Raspberry Pi and USB speakers.

You will also need a developer account and API key with OpenAI (https://platform.openai.com/overview), a Tavily Search agent API key (https://app.tavily.com/sign-in) and an Access Key and Custom Voice Model with PicoVoice (https://console.picovoice.ai/) and (https://console.picovoice.ai/ppn respectively. Please create your own voice model and download the correct version for use on a Raspberry Pi)

Now on to the Pi setup. Let's get started!

Run the following on your Raspberry Pi terminal:

sudo apt update
sudo apt install python3-gpiozero
git clone https://github.com/Olney1/ChatGPT-OpenAI-Smart-Speaker
Firstly, update your tools: pip install --upgrade pip setuptools then pip install openai pyaudio SpeechRecognition gTTS pydub python-dotenv apa102-pi gpiozero Next, install the dependencies, pip install -r requirements.txt. I am using Python 3.9 #!/usr/bin/env python3.9. You can install these packages using pip or use pipenv if you wish to contain a virtual environment.
PyAudio relies on PortAudio as a dependency. You can install it using the following command: sudo apt-get install portaudio19-dev
Pydub dependencies: You need to have ffmpeg installed on your system. On a Raspberry Pi you can install it using: sudo apt-get install ffmpeg. You may also need simpleaudio if you run into issues with the script hanging when finding the wake word, so it's best to install these packages just in case: sudo apt-get install python3-dev (for development headers to compile) and install simpleaudio (for a different backend to play mp3 files) and sudo apt-get install libasound2-dev (necessary dependencies).
If you are using the RESPEAKER, follow this guide to install the required dependencies: (https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/#getting-started). Then install support for the lights on the RESPEAKER board. You'll need APA102 LED: sudo apt install -y python3-rpi.gpio and then sudo pip3 install apa102-pi.
Activate SPI: sudo raspi-config; Go to "Interface Options"; Go to "SPI"; Enable SPI; While you are at it: Do change the default password! Exit the tool and reboot.
Get the Seeed voice card source code, install and reboot: git clone https://github.com/HinTak/seeed-voicecard.git cd seeed-voicecard sudo ./install.sh sudo reboot now
Finally, load audio output on Raspberry Pi sudo raspi-config -Select 1 System options -Select S2 Audio -Select your preferred Audio output device -Select Finish

Usage - applies to chat.py:

You'll need to set up the environment variables for your Open API Key. To do this create a .env file in the same directory and add your API Key to the file like this: OPENAI_API_KEY="API KEY GOES HERE". This is safer than hard coding your API key into the program. You must not change the name of the variable OPENAI_API_KEY.
Run the script using python chat.py.
The script will prompt you to say something. Speak a sentence into your microphone. You may need to allow the program permission to access your microphone on a Mac, a prompt should appear when running the program.
The script will send the spoken sentence to OpenAI, generate a response using the text-to-speech model, and play the response as an audio file.

Usage - applies to pi.py

You'll need to set up the environment variables for your Open API Key, PicoVoice Access Key and Tavily API key for agent searches. To do this create a .env file in the same directory and add your API Keys to the file like this: OPENAI_API_KEY="API KEY GOES HERE" and ACCESS_KEY="PICOVOICE ACCESS KEY GOES HERE" and TAVILY_API_KEY="API KEY GOES HERE". This is safer than hard coding your API key into the program.
Ensure that you have the pi.py script along with apa102.py and alexa_led_pattern.py scripts in the same folder saved on your Pi if using ReSpeaker.
Run the script using python3 pi.py or python3 pi.py 2> /dev/null on the Raspberry Pi. The second option omits all developer warnings and errors to keep the console focused purely on the print statements.
The script will prompt you to say the wake word which is programmed into the wake word custom model by Picovoice as 'Jeffers'. You can change this to any name you want. Once the wake word has been detected the lights will light up blue. It will now be ready for you to ask your question. When you have asked your question, or when the microphone picks up and processes noise, the lights will rotate a blue colour meaning that your recording sample/question is being sent to OpenAI.
The script will then generate a response using the text-to-speech model, and play the response as an audio file.

Customisation

You can change the OpenAI model engine by modifying the value of model_engine. For example, to use the "gpt-3.5-turbo" model for a cheaper and quicker response but with a knowledge cut-off to Sep 2021, set model_engine = "gpt-3.5-turbo".
You can change the language of the generated audio file by modifying the value of language. For example, to generate audio in French, set language = 'fr'.
You can adjust the temperature parameter in the following line to control the randomness of the generated response:

response = client.chat.completions.create(
        model=model_engine,
        messages=[{"role": "system", "content": "You are a helpful smart speaker called Jeffers!"}, # Play about with more context here.
                  {"role": "user", "content": prompt}],
        max_tokens=1024,
        n=1,
        temperature=0.7,
    )
    return response

Higher values of temperature will result in more diverse and random responses, while lower values will result in more deterministic responses.

Important notes for Raspberry Pi Installation

If you are using the same USB speaker in my video you will need to run sudo apt-get install pulseaudio to install support for this. This may also require you to set a command to start pulseaudio on every boot: pulseaudio --start.

Adding a Start Command on Boot

Open the terminal and type: sudo nano /etc/rc. local

After important network/start commands add this: su -l pi -c '/usr/bin/python3 /home/pi/ChatGPT-OpenAI-Smart-Speaker/ && pulseaudio --start && python3 pi.py 2> /dev/null’

Be sure to leave the line exit 0 at the end, then save the file and exit. In nano, to exit, type Ctrl-x, and then Y

ReSpeaker

If you want to use ReSpeaker for the lights, you can purchase this from most of the major online stores that stock Raspberry Pi. Here is the online guide: https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/

To test your microphone and speakers install Audacity on your Raspberry Pi:

sudo apt update

sudo apt install audacity

audacity

Other Possible Issues

On the raspberry pi you may encounter an error regarding the installation of flac.

See here for the resolution: https://raspberrypi.stackexchange.com/questions/137630/im-unable-to-install-flac-on-my-raspberry-pi-3

The files you will need are going to be here: https://archive.raspbian.org/raspbian/pool/main/f/flac/
Please note the links below may have changed or be updated, so please refer back to this link above for the latest file names and then update your command below.

sudo apt-get install libogg0

$ wget https://archive.raspbian.org/raspbian/pool/main/f/flac/libflac8_1.3.2-3+deb10u3_armhf.deb

$ wget https://archive.raspbian.org/raspbian/pool/main/f/flac/flac_1.3.2-3+deb10u3_armhf.deb

$ sudo dpkg -i libflac8_1.3.2-3+deb10u3_armhf.deb

$ sudo dpkg -i flac_1.3.2-3+deb10u3_armhf.deb

$ which flac /usr/bin/flac

sudo reboot

$ flac --version flac 1.3.2

You may find you need to install GStreamer if you encounter errors regarding Gst.

Install GStreamer: Open a terminal and run the following command to install GStreamer and its base plugins:

sudo apt-get install gstreamer1.0-tools gstreamer1.0-plugins-base gstreamer1.0-plugins-good This installs the GStreamer core, along with a set of essential and good-quality plugins.

Next, you need to install the Python bindings for GStreamer. Use this command:

sudo apt-get install python3-gst-1.0 This command installs the GStreamer bindings for Python 3.

Install Additional GStreamer Plugins (if needed): Depending on the audio formats you need to work with, you might need additional GStreamer plugins. For example, to install plugins for MP3 playback, use:

sudo apt-get install gstreamer1.0-plugins-ugly

To quit a running script on Pi from boot: ALT + PrtScSysRq (or Print button) + K

Credit to:

https://github.com/tinue/apa102-pi & Seeed Technology Limited for supplementary code.

chatgpt-openai-smart-speaker's People

Contributors

Stargazers

Watchers

chatgpt-openai-smart-speaker's Issues

Sound not playing with play_audio_file()

Redefine play_audio_file() in smart_speaker.py using pydub

The initial code looks like below:

def play_audio_file():
    # play the audio file and wake speaking LEDs
    pixels.speak()
    # os.system("mpg321 response.mp3")
    playsound("response.mp3", block=False) # There’s an optional second argument, block, which is set to True by default. Setting it to False makes the function run asynchronously.

However, we can simply fix this using pydub package instead of playsound.

This is the code that works for me:

from pydub import AudioSegment
from pydub.playback import play

def play_audio_file():
    song = AudioSegment.from_mp3("response.mp3")
    play(song)

Testing Microphone and Speaker

Record sound with Python

When using the code which the ReSpeaker Website provided, some errors are generated.

To run the following examples, clone https://github.com/respeaker/4mics_hat.git repository to your Raspberry Pi:

git clone https://github.com/respeaker/4mics_hat.git

All the Python scripts, mentioned in the examples below can be found inside this repository. To install the necessary dependencies, from mic_hat repository folder, run:

sudo apt-get install portaudio19-dev libatlas-base-dev
cd 4mics_hat/ # Do not forget to change to the correct directory
pip3 install -r requirements.txt

We use PyAudio python library to record sound with Python.

python3 recording_examples/get_device_index.py

You will see the device ID as below.

Input Device id  2  -  seeed-4mic-voicecard: - (hw:1,0)

To record the sound, open recording_examples/record.py file with nano or other text editor and change RESPEAKER_INDEX = 2 to index number of ReSpeaker on your system. Then run python script record.py to make a recording:

python3 recording_examples/record.py

To play the recorded samples you can use aplay:

aplay output.wav # Recorded voice will be saved in the output.wav file

If the sound does not play, check another issue page that I published to check if both microphone and speaker are connected properly: #4

Closed uncommitted pull requests

Hi, I sent you three pull request dealing with a warning when using stream to file. You closed them without committing. Will fix it yourself?

PyAudio IOError: No Default Input Device Available

Hi I installed all the components on an AWS EC2 instance (ubuntu20) and got this error message when I run "python smart_speaker.py"

ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM default
ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_id returned error: No such file or directory
ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1334:(snd_func_refer) error evaluating name
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM dmix
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
Traceback (most recent call last):
File "/root/ChatGPT-OpenAI-Smart-Speaker/smart_speaker.py", line 68, in
main()
File "/root/ChatGPT-OpenAI-Smart-Speaker/smart_speaker.py", line 59, in main
prompt = recognize_speech()
File "/root/ChatGPT-OpenAI-Smart-Speaker/smart_speaker.py", line 16, in recognize_speech
with sr.Microphone() as source:
File "/usr/local/lib/python3.10/dist-packages/speech_recognition/init.py", line 99, in init
device_info = audio.get_device_info_by_index(device_index) if device_index is not None else audio.get_default_input_device_info()
File "/usr/lib/python3/dist-packages/pyaudio.py", line 949, in get_default_input_device_info
device_index = pa.get_default_input_device

Any idea how I can fix this?

Error, not work on laptop win 10

D:\New folder\ChatGPT-OpenAI-Smart-Speaker-main>python smart_speaker.py
Say something!
result2:
{ 'alternative': [ { 'confidence': 0.79627585,
'transcript': 'same time captain'},
{'transcript': 'same time come on'},
{'transcript': 'same time'},
{'transcript': 'same time okay'},
{'transcript': 'same time cap on'}],
'final': True}
Google Speech Recognition thinks you said same time captain
result2:
{ 'alternative': [ { 'confidence': 0.79627585,
'transcript': 'same time captain'},
{'transcript': 'same time come on'},
{'transcript': 'same time'},
{'transcript': 'same time okay'},
{'transcript': 'same time cap on'}],
'final': True}
This is the prompt being sent to OpenAIsame time captain
james cook was born in marton-in-cleveland, england

1728
'mpg321' is not recognized as an internal or external command,
operable program or batch file.

Error 263 for command:
    open response.mp3
The specified device is not open or is not recognized by MCI.

Error 263 for command:
    close response.mp3
The specified device is not open or is not recognized by MCI.

Failed to close the file: response.mp3
Traceback (most recent call last):
File "D:\New folder\ChatGPT-OpenAI-Smart-Speaker-main\smart_speaker.py", line 68, in
main()
File "D:\New folder\ChatGPT-OpenAI-Smart-Speaker-main\smart_speaker.py", line 65, in main
play_audio_file()
File "D:\New folder\ChatGPT-OpenAI-Smart-Speaker-main\smart_speaker.py", line 55, in play_audio_file
playsound("response.mp3", block=False) # There’s an optional second argument, block, which is set to True by default. Setting it to False makes the function run asynchronously.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thucn\AppData\Local\Programs\Python\Python311\Lib\site-packages\playsound.py", line 72, in _playsoundWin
winCommand(u'open {}'.format(sound))
File "C:\Users\thucn\AppData\Local\Programs\Python\Python311\Lib\site-packages\playsound.py", line 64, in winCommand
raise PlaysoundException(exceptionMessage)
playsound.PlaysoundException:
Error 263 for command:
open response.mp3
The specified device is not open or is not recognized by MCI.

ReSpeaker 4mic Array for Raspberry Pi is not detected

Install another driver for the ReSpeaker

If you are using the ReSpeaker, you may encounter the same problem that the Raspberry Pi does not recognise our microphone.

You can check if this is the case using the code below:

arecord -L

If you have installed the correct driver, you will probably see the result like this:

pi@raspberrypi:~ $ arecord -L
null
    Discard all samples (playback) or generate zero samples (capture)
jack
    JACK Audio Connection Kit
pulse
    PulseAudio Sound Server
default
playback
ac108
sysdefault:CARD=seeed4micvoicec
    seeed-4mic-voicecard,
    Default Audio Device
dmix:CARD=seeed4micvoicec,DEV=0
    seeed-4mic-voicecard,
    Direct sample mixing device
dsnoop:CARD=seeed4micvoicec,DEV=0
    seeed-4mic-voicecard,
    Direct sample snooping device
hw:CARD=seeed4micvoicec,DEV=0
    seeed-4mic-voicecard,
    Direct hardware device without any conversions
plughw:CARD=seeed4micvoicec,DEV=0
    seeed-4mic-voicecard,
    Hardware device with all software conversions
usbstream:CARD=seeed4micvoicec
    seeed-4mic-voicecard
    USB Stream Output
usbstream:CARD=ALSA
    bcm2835 ALSA
    USB Stream Output

If your Raspberry Pi does not show CARD=seeed4micvoicec, the microphone is not deteced.

If that is your case, you should follow the steps below:

Step 1

If you have already downloaded the wrong driver, I recommend you to restart from installing new Raspberry Pi OS (64GB recommended)
When you finish downloading it, access to your Raspberry Pi and get the Seeed voice card source code:

sudo apt-get update
git clone https://github.com/HinTak/seeed-voicecard.git
cd seeed-voicecard
sudo ./install.sh
sudo reboot

Step 2

Select audio output on Raspberry Pi:

sudo raspi-config
# Select 1 System options
# Select S2 Audio
# Select your preferred Audio output device (USB if you are connecting speaker with usb cable)
# Select Finish

Step 3

Check that the sound card name looks like this:

pi@raspberrypi:~ $ arecord -L
null
    Discard all samples (playback) or generate zero samples (capture)
jack
    JACK Audio Connection Kit
pulse
    PulseAudio Sound Server
default
playback
ac108
sysdefault:CARD=seeed4micvoicec
    seeed-4mic-voicecard,
    Default Audio Device
dmix:CARD=seeed4micvoicec,DEV=0
    seeed-4mic-voicecard,
    Direct sample mixing device
dsnoop:CARD=seeed4micvoicec,DEV=0
    seeed-4mic-voicecard,
    Direct sample snooping device
hw:CARD=seeed4micvoicec,DEV=0
    seeed-4mic-voicecard,
    Direct hardware device without any conversions
plughw:CARD=seeed4micvoicec,DEV=0
    seeed-4mic-voicecard,
    Hardware device with all software conversions
usbstream:CARD=seeed4micvoicec
    seeed-4mic-voicecard
    USB Stream Output
usbstream:CARD=ALSA
    bcm2835 ALSA
    USB Stream Output

If you get the same results, congrats you have now finished setting up the ReSpeaker!

If you want to test it, you can go through the code that the ReSpeaker maker published on their website:
https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.