Coder Social home page Coder Social logo

jim-schwoebel / nala_assistant Goto Github PK

View Code? Open in Web Editor NEW
24.0 4.0 1.0 32.34 MB

๐Ÿ”Š๐Ÿ˜Š A fastapi voice-assistant framework to quickly prototype LLM-powered voice assistants in <5 minutes.

Python 14.45% CSS 26.46% HTML 16.25% JavaScript 42.84%
chatbot chatgpt dolly2 fastapi fastapi-boilerplate fastapi-sqlalchemy fastapi-template llm llms speech-recognition

nala_assistant's Introduction

ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย  Python Dependencies GitHub Issues License Contributions welcome

Nala

Nala is a voice-assistant framework to quickly build and prototype voice assistants in <5 minutes within the greater context of the emerging large-language-model (LLM) landscape. With Nala you can easily intgrate with state-of-the-art (SOTA) transcription like Whisper API, text-to-speech synthesis engines like Microsoft's SpeechT5 model, and LLMs like Dolly-v2-3b within a nice front-end - across any arbitrary wake word powered with the Web Speech API.

Here are some of Nala's key features:

  • Extensible Architecture: Nala offers a flexible and modular, python-centric FastAPI architecture that allows developers to extend its functionality with ease. Integrate new response models or TTS voice skins into your projects effortlessly.
  • Native LLM Integration: Nala integrates directly with the the Dolly-v2-3b LLM model - and makes it easy for you to integrate with others using an easy-to-follow strategy with helper functions.
  • Multi-Platform Support: Nala is designed to work seamlessly across various platforms and operating systems (e.g. Mac/Linux and Chrome/Safari). Whether you're building web applications, mobile apps, or even IoT devices, Nala can be easily integrated into your technology stack.
  • Audio-to-Audio API: Nala's FastAPI design allows for you to submit an audio file and get back audio file responses through the query-response model; few projects out there exist to help guide you through how to do this, so this may help accelerate learning for your voice assistant projects.
  • Simple UI: Nala provides a simple user interface for users to quickly rate responses with thumbs up or thumbs-down to aid in building reinforcement learning models with Reinforcement Learning with Human Feedback.
  • Privacy and Security: Nala allows for downloads to be administered by superusers as specified in the settings.json - as well as authenticates users and sessions with standard JSON web tokens. Other features like encryption at rest, deletion of audio files, and other defaults are being worked on right now to preserve user privacy.

Note that this is a version 2.0, web-enabled version of a prior voice assistant app here.

getting started

mac (locally)

Install basic dependencies:

sudo apt-get install ffmpeg
git clone [email protected]:jim-schwoebel/nala_assistant.git
cd nala_assistant
virtualenv env 
source env/bin/activate
pip3 install -r requirements.txt
pip3 install git+https://github.com/suno-ai/bark.git

Generate a secret key for SESSION_SECRET, JWT_SECRET_KEY, JWT_REFRESH_SECRET_KEY and environment vars using the following line of code 3 times (save this in .env)

python -c 'import secrets; print(secrets.token_hex())'

Also, you need a WEB_URL and TERMS_URL for your website and the terms of use, accordingly. These also are in the .env file.

To open and edit .env file:

nano .env

Then run the app:

uvicorn app:app --reload

Note if you having trouble with the uvicorn app:app --reload command, you can try:

python3 -m uvicorn app:app --reload

And sometimes this make it work.

You will now be able to visit localhost (http://127.0.0.1:8000) to use application.

linux with GPU (locally)

Install basic dependencies:

sudo apt-get install ffmpeg
git clone [email protected]:jim-schwoebel/nala_assistant.git
cd nala_assistant
virtualenv env 
source env/bin/activate
pip3 install -r gpu_requirements.txt
pip3 install git+https://github.com/suno-ai/bark.git

Generate a secret key for SESSION_SECRET, JWT_SECRET_KEY, JWT_REFRESH_SECRET_KEY and environment vars using the following line of code 3 times (save this in .env)

python -c 'import secrets; print(secrets.token_hex())'

Also, you need a WEB_URL and TERMS_URL for your website and the terms of use, accordingly. These also are in the .env file.

To open and edit .env file:

nano .env

Then run the app:

uvicorn app:app --reload

You will now be able to visit localhost (http://127.0.0.1:8000) to use appication.

api docs (locally)

Once you have setup the app locally, you can get to the api docs @ http://127.0.0.1:8000/docs (for swagger docs) or http://127.0.0.1:8000/redoc (for redoc). The recommended set of docs to use is http://127.0.0.1:8000/docs (swagger) as there is greater support for authentication with JSON web tokens and audio-to-audio routes. A screenshot is shown below of the docs to give you an idea of what they look like. The auto-generated docs via FastAPI make it much easier to expand the routes to your particular need as a developer.

deploying to server (externally)

Follow these instructions to deploy on a server.

  1. Buy a domain on namecheap.com.
  2. Get a vultr account / forward DNS to cloudflare from domain. Note that you will need at least 1 NVIDIA V100 GPU to have a seamless user experience with the Bark model and various LLMs like Dolly.
  3. Get a cert.pem and private.pem file on cloudflare for the server.
  4. Create a virtual machine on vultr or a similar platform, forward CNAME on cloudflare to IP address of host.
  5. Set up the server with at least 1 NVIDIA V100 GPU (e.g. pip3 install -r gpu_requirements.txt), as described in the linux with GPU (locally) section above.
  6. Run the command on the server with uvicorn below.

Enable firewall rules for SSL (port 443)

sudo ufw allow 80
sudo ufw allow 443
nohup gunicorn --bind {ip_address}:443 main:app --certfile=cert.pem --keyfile=private.pem -w 10 --graceful-timeout 30 -t 30 --worker-class=uvicorn.workers.UvicornWorker --workers 10 </dev/null &>/dev/null &

</dev/null &>/dev/null & is a statement means that it is a background job, and you need to change [ip_address] with the right IP adddress.

settings

Here are the current settings that you can edit in th settings.json file:

{"website_name": "Nala",
    "wake_word": "hey", 
    "super_users": ["[email protected]"],
    "audio_delete": {"default": false, "options": [true,false]},
    "sounds": {"default": "chime", "options": ["chime", "bell"]}, 
    "voice": {"default": "bark", "options": ["microsoft", "bark"]}, 
    "response_type": {"default": "dolly", "options": ["blender","dolly", "echo"]}, 
    "language": {"default": "en-us", "options": ["en-us"]}}

You can edit the website name, wakeword, super_users (registered users who can download data), sounds (after query), voice (response skin), response_type (e.g. LLM models), and language (e.g. en-us only supported for now) here in the file. Note that the options listed here are currently the only options provided in the repository, but they are easy-to-extend as a framework later in the helpers.py file.

browser compatibility

Currently, Nala works on Chrome and Safari-based browsers based on Web Speech API standards. If you load Nala on any other browser, it will give an error message like this.

Note that you can find a current list of browsers that support the Web Speech API here or in the figure below.

maintainers

This project was incubated as a result of the Erdos Fellowship program - and since has resulted in a larger independent initiative.

Here is a list of active maintainers to this project:

  • Jim - chief maintainer, Erdos Institute mentor
  • Jin - Erdos Institute fellow
  • Nathan - Erdos Institute fellow
  • Collin - Data scientist @ Indeed.com (project advisor)

If you'd like to help maintain this project, reach out to Jim Schwoebel @ [email protected] and he can invite you to our weekly call to ship PRs and delegate work in our sprint cycle.

references

Here is a quick list of references for additional reading.

javascript front-end

feature extraction

ML models (used)

future tools used

  • auth0 - authentication / tokens
  • minio - minio is an object storage platform

nala_assistant's People

Contributors

jim-schwoebel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.