Coder Social home page Coder Social logo

primedeviation / verbi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from promtengineer/verbi

0.0 0.0 0.0 307 KB

A modular voice assistant application for experimenting with state-of-the-art transcription, response generation, and text-to-speech models. Supports OpenAI, Groq, Elevanlabs, CartesiaAI, and Deepgram APIs, plus local models via Ollama. Ideal for research and development in voice technology.

License: MIT License

Python 100.00%

verbi's Introduction

VERBI - Voice Assistant ๐ŸŽ™๏ธ

GitHub Stars GitHub Forks GitHub Issues GitHub Pull Requests License

Motivation โœจโœจโœจ

Welcome to the Voice Assistant project! ๐ŸŽ™๏ธ Our goal is to create a modular voice assistant application that allows you to experiment with state-of-the-art (SOTA) models for various components. The modular structure provides flexibility, enabling you to pick and choose between different SOTA models for transcription, response generation, and text-to-speech (TTS). This approach facilitates easy testing and comparison of different models, making it an ideal platform for research and development in voice assistant technologies. Whether you're a developer, researcher, or enthusiast, this project is for you!

Features ๐Ÿงฐ

  • Modular Design: Easily switch between different models for transcription, response generation, and TTS.
  • Support for Multiple APIs: Integrates with OpenAI, Groq, and Deepgram APIs, along with placeholders for local models.
  • Audio Recording and Playback: Record audio from the microphone and play generated speech.
  • Configuration Management: Centralized configuration in config.py for easy setup and management.

Project Structure ๐Ÿ“‚

voice_assistant/
โ”œโ”€โ”€ voice_assistant/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ audio.py
โ”‚   โ”œโ”€โ”€ api_key_manager.py
โ”‚   โ”œโ”€โ”€ config.py
โ”‚   โ”œโ”€โ”€ transcription.py
โ”‚   โ”œโ”€โ”€ response_generation.py
โ”‚   โ”œโ”€โ”€ text_to_speech.py
โ”‚   โ”œโ”€โ”€ utils.py
โ”‚   โ”œโ”€โ”€ local_tts_api.py
โ”‚   โ”œโ”€โ”€ local_tts_generation.py
โ”œโ”€โ”€ .env
โ”œโ”€โ”€ run_voice_assistant.py
โ”œโ”€โ”€ setup.py
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

Setup Instructions ๐Ÿ“‹

Prerequisites โœ…

  • Python 3.10 or higher
  • Virtual environment (recommended)

Step-by-Step Instructions ๐Ÿ”ข

  1. ๐Ÿ“ฅ Clone the repository
   git clone https://github.com/PromtEngineer/Verbi.git
   cd Verbi
  1. ๐Ÿ Set up a virtual environment

Using venv:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Using conda:

    conda create --name verbi python=3.10
    conda activate verbi
  1. ๐Ÿ“ฆ Install the required packages
   pip install -r requirements.txt
  1. ๐Ÿ› ๏ธ Set up the environment variables

Create a .env file in the root directory and add your API keys:

    OPENAI_API_KEY=your_openai_api_key
    GROQ_API_KEY=your_groq_api_key
    DEEPGRAM_API_KEY=your_deepgram_api_key
    LOCAL_MODEL_PATH=path/to/local/model
  1. ๐Ÿงฉ Configure the models

Edit config.py to select the models you want to use:

    class Config:
        # Model selection
        TRANSCRIPTION_MODEL = 'groq'  # Options: 'openai', 'groq', 'deepgram', 'fastwhisperapi' 'local'
        RESPONSE_MODEL = 'groq'       # Options: 'openai', 'groq', 'ollama', 'local'
        TTS_MODEL = 'deepgram'        # Options: 'openai', 'deepgram', 'elevenlabs', 'local', 'melotts'

        # API keys and paths
        OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
        GROQ_API_KEY = os.getenv("GROQ_API_KEY")
        DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY")
        LOCAL_MODEL_PATH = os.getenv("LOCAL_MODEL_PATH")

If you are running LLM locally via Ollama, make sure the Ollama server is runnig before starting verbi.

  1. ๐Ÿ”Š Configure ElevenLabs Jarvis' Voice
  • Voice samples here.
  • Follow this link to add the Jarvis voice to your ElevenLabs account.
  • Name the voice 'Paul J.' or, if you prefer a different name, ensure it matches the ELEVENLABS_VOICE_ID variable in the text_to_speech.py file.
  1. ๐Ÿƒ Run the voice assistant
   python run_voice_assistant.py
  1. ๐ŸŽค Install FastWhisperAPI

    Optional step if you need a local transcription model

    Clone the repository

       cd..
       git clone https://github.com/3choff/FastWhisperAPI.git
       cd FastWhisperAPI

    Install the required packages:

       pip install -r requirements.txt

    Run the API

       fastapi run main.py

    Alternative Setup and Run Methods

    The API can also run directly on a Docker container or in Google Colab.

    Docker:

    Build a Docker container:

       docker build -t fastwhisperapi .

    Run the container

       docker run -p 8000:8000 fastwhisperapi

    Refer to the repository documentation for the Google Colab method: https://github.com/3choff/FastWhisperAPI/blob/main/README.md

  2. ๐ŸŽค Install Local TTS - MeloTTS

    Optional step if you need a local Text to Speech model

    Install MeloTTS from Github

    Use the following link to install MeloTTS for your operating system.

    Once the package is installed on your local virtual environment, you can start the api server using the following command.

       python voice_assistant/local_tts_api.py

    The local_tts_api.py file implements as fastapi server that will listen to incoming text and will generate audio using MeloTTS model. In order to use the local TTS model, you will need to update the config.py file by setting:

       TTS_MODEL = 'melotts'        # Options: 'openai', 'deepgram', 'elevenlabs', 'local', 'melotts'

    You can run the main file to start using verbi with local models.

Model Options โš™๏ธ

Transcription Models ๐ŸŽค

  • OpenAI: Uses OpenAI's Whisper model.
  • Groq: Uses Groq's Whisper-large-v3 model.
  • Deepgram: Uses Deepgram's transcription model.
  • FastWhisperAPI: Uses FastWhisperAPI, a local transcription API powered by Faster Whisper.
  • Local: Placeholder for a local speech-to-text (STT) model.

Response Generation Models ๐Ÿ’ฌ

  • OpenAI: Uses OpenAI's GPT-4 model.
  • Groq: Uses Groq's LLaMA model.
  • Ollama: Uses any model served via Ollama.
  • Local: Placeholder for a local language model.

Text-to-Speech (TTS) Models ๐Ÿ”Š

  • OpenAI: Uses OpenAI's TTS model with the 'fable' voice.
  • Deepgram: Uses Deepgram's TTS model with the 'aura-angus-en' voice.
  • ElevenLabs: Uses ElevenLabs' TTS model with the 'Paul J.' voice.
  • Local: Placeholder for a local TTS model.

Detailed Module Descriptions ๐Ÿ“˜

  • run_verbi.py: Main script to run the voice assistant.
  • voice_assistant/config.py: Manages configuration settings and API keys.
  • voice_assistant/api_key_manager.py: Handles retrieval of API keys based on configured models.
  • voice_assistant/audio.py: Functions for recording and playing audio.
  • voice_assistant/transcription.py: Manages audio transcription using various APIs.
  • voice_assistant/response_generation.py: Handles generating responses using various language models.
  • voice_assistant/text_to_speech.py: Manages converting text responses into speech.
  • voice_assistant/utils.py: Contains utility functions like deleting files.
  • voice_assistant/local_tts_api.py: Contains the api implementation to run the MeloTTS model.
  • voice_assistant/local_tts_generation.py: Contains the code to use the MeloTTS api to generated audio.
  • voice_assistant/__init__.py: Initializes the voice_assistant package.

Roadmap ๐Ÿ›ค๏ธ๐Ÿ›ค๏ธ๐Ÿ›ค๏ธ

Here's what's next for the Voice Assistant project:

  1. Add Support for Streaming: Enable real-time streaming of audio input and output.
  2. Add Support for ElevenLabs and Enhanced Deepgram for TTS: Integrate additional TTS options for higher quality and variety.
  3. Add Filler Audios: Include background or filler audios while waiting for model responses to enhance user experience.
  4. Add Support for Local Models Across the Board: Expand support for local models in transcription, response generation, and TTS.

Contributing ๐Ÿค

We welcome contributions from the community! If you'd like to help improve this project, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Make your changes and commit them (git commit -m 'Add new feature').
  4. Push to the branch (git push origin feature-branch).
  5. Open a pull request detailing your changes.

Star History โœจโœจโœจ

Star History Chart

verbi's People

Contributors

promtengineer avatar 3choff avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.