Coder Social home page Coder Social logo

deepgram-devs / deepgram-conversational-demo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from deepgram-starters/nextjs-live-transcription

300.0 2.0 88.0 6.49 MB

Deepgram Conversational AI demo

Home Page: https://emilyai.deepgram.com

License: MIT License

JavaScript 2.00% TypeScript 95.33% CSS 2.67%
asr deepgram nextjs react stt tts vercel

deepgram-conversational-demo's Introduction

Deepgram AI Agent Technical Demo

Combine Text-to-Speech and Speech-to-Text into a conversational agent.

Project codename EmilyAI

Discord

The purpose of this demo is to showcase how you can build a Conversational AI application that engages users in natural language interactions, mimicking human conversation through natural language processing using Deepgram.

Examples of where you would see this type of application include: virtual assistants for tasks like answering queries and controlling smart devices, educational tutors for personalized learning, healthcare advisors for medical information, and entertainment chat bots for engaging conversations and games.

These applications aim to enhance user experiences by offering efficient and intuitive interactions, reducing the need for human intervention in various tasks and services.

Issue Reporting

If you have found a bug or if you have a feature request, please report them at this repository issues section. Please do not report security vulnerabilities on the public GitHub issue tracker.

Check out our KNOWN ISSUES before reporting.

Demo features

What is Deepgram?

Deepgram is a foundational AI company providing speech-to-text and language understanding capabilities to make data readable and actionable by human or machines.

Sign-up to Deepgram

Want to start building using this project? Sign-up now for Deepgram and create an API key.

Quickstart

Manual

Follow these steps to get started with this starter application.

Clone the repository

Go to GitHub and clone the repository.

Install dependencies

Install the project dependencies.

npm install

Edit the config file

Copy the code from sample.env.local and create a new file called .env.local.

DEEPGRAM_STT_DOMAIN=https://api.deepgram.com
DEEPGRAM_API_KEY=YOUR-DG-API-KEY
OPENAI_API_KEY=YOUR-OPENAI-API-KEY
  1. For DEEPGRAM_API_KEY paste in the key you generated in the Deepgram console.
  2. Set DEEPGRAM_STT_DOMAIN to be https://api.deepgram.com.
  3. OPENAI_API_KEY should be an OpenAI API Key that can access the chat completions API.

Run the application

Once running, you can access the application in your browser.

npm run dev

Getting Help

We love to hear from you so if you have questions, comments or find a bug in the project, let us know! You can either:

Author

Deepgram

License

This project is licensed under the MIT license. See the LICENSE file for more info.

deepgram-conversational-demo's People

Contributors

butzhang avatar damiendeepgram avatar jpvajda avatar lukeocodes avatar michellelychan avatar natalierutgers avatar raivaibhav avatar semantic-release-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

deepgram-conversational-demo's Issues

unable to reconnect

What is the current behavior?

unable to re-establish connection after a period of inactivity

Steps to reproduce

open https://emilyai.deepgram.com/ and leave it dormant for like 20 minutes, then notice that the toast repeatedly pops up "The connection to Deepgram closed, we'll attempt to reconnect." and the transcription doesn't work

Expected behavior

connection re-establishes

Please tell us about your environment

this is in production, chrome and macos

Other information

I documented more information and logging here:
https://discord.com/channels/1108042150941294664/1245769351689146491/1245769351689146491

I did some debugging + logging on a forked version of this repo, and in my forked repo the loop is

  • connection closes due to inactivity
  • tries to reopen connection
  • connection is reopened
  • connection closes immediately without a reason (???)
  • loop restarts

Potential incomplete-is-final error occuring in app

re: emailyai; I periodically get to points in the conversation where the STT is clearly good and done and waiting, but the reponse mechanism isn’t triggered (I don’t see the STT latency thing pop up.) and so I can just sit there and wait and nothing happens until I speak again and then, it’ll probably pick up and go again. Any idea what’s going on?

using the UtteranceEnd event, treat it like a speech_final=true and flush the Utterance.

The model is reading ¡ as "exclamation"

I said: Yep. Uh, the thing is I speak Spanish, so I'm wondering if, uh, we can talk in Spanish.
Asteria
Today at 1:54 AM
¡Hola! I can understand Spanish, but I'll respond in English. How can I help you today?

"Exclamation mark" Hola "Exclamation mark" ......

She is also reading: https:// literally.

Mobile UX Tweaks

Small tweaks to the mobile user experience, like scroll position on opening keyboard, buttons and icons in the controls bar, etc.

Speech to text is not working

Speech to text is not working

in this code you are mentioning different api for tts and stt but deepgram offering one key for both

iOS Autoplay Issues

Seems to be a very well-known issue with iOS devices. Apple requires a user event to play audio in the browser - basically.

A possible fix is to use the launchpage click to start the app to play a zero-audio mp3 file, and change the context of that audio object when playing audio from the queue.

See: https://matt-harrison.com/posts/web-audio/

Unknown error displayed when service tries to establish websocket connection

What is the current behavior?

pull the repo and everything runs normal. However, when I ctrl-c and exit and then try to do npm run dev again. It started to show the error. And under the hood, it keeps retrying
Once I changed my API key, it gets back to normal.

What's happening that seems wrong?
it shouldn't show error msg.

Steps to reproduce

Mentioned above

To make it faster to diagnose the root problem. Tell us how can we reproduce the bug.
mentioned above

Expected behavior

What would you expect to happen when following the steps above?

Please tell us about your environment

node --version
v20.11.0

We want to make sure the problem isn't specific to your operating system or programming language.

  • Operating System/Version: MAC
  • Language: NextJS
  • Browser: Chrome

Other information

image image image

Anything else we should know? (e.g. detailed explanation, stack-traces, related issues, suggestions how to fix, links for us to have context, eg. stack overflow, codepen, etc)

Add a emoji picker

Proposed changes

I just realized there should be an option to add emoji, why? emoji are the easy way to express emotions?

Context

Better UX? (I am not a Subject matter expert, here)

Possible Implementation

https://github.com/missive/emoji-mart I think this is perfect for usecase?

Other information

I would be happy to contribute

Option to deactivate barge-in

Proposed changes

Deactivate barge-in

Context

At times (e.g. Educational content), it's necessary to ensure that the queue is processed without interruptions. Currently, adding messages to the queue can cause disruptions, leading to the current audio being interrupted and restarted.

BTW Love your work Guys.

Thanks.

Deepgram Text to Speech not speaking math equations properly

What is the current behavior?

Math equations are not being spoken properly by Deepgram Text to Speech service

Steps to reproduce

write any math equation in input

To make it faster to diagnose the root problem. Tell us how can we reproduce the bug.

Echocancellation doesn't always work in Chrome/Chromium browsers

Caused by a user not using peer-devices. Here is the ticket: https://bugs.chromium.org/p/chromium/issues/detail?id=687574 It basically says that the echo cancellation only works for audio that is coming from a peer connection. As soon as it is processed locally by the Web Audio API it will not be considered anymore by the echo cancellation.

Possible fix is to go ahead and volume-down the playback when you start speaking. This will improve the barge-in experience, and possibly duck the playback under the microphones' decibel threshold so it doesn't pick itself up.

Micro stop to listen once I change the voice

Micro stop to listen once I change the voice
I refreshed many times, did not work.
Windows 11
Chrome Version 122.0.6261.95
I had to clean the cache to be able to access the conversation.

UX Improvements

Proposed changes

  • Selecting model before initialization, give user an option to select the model?
  • Text area with limited rows instead of input? right now if I press shift+enter then it acts as a enter
  • Maintaining the state of chosen model on refresh?
  • Mobile view fixes: logo image is too big (width wise) for small screen, if text messages are big then it doesn't break into another line?
  • Performance optimization? (check the lighthouse)

Context

Few of my initial findings, while I was on my phone. First thing I noticed after login into the Deepgram that it have different model, but there is no option to choose.

Possible Implementation

Not obligatory, but suggest an idea for implementing addition or change

Other information

I would be happy to contribute.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.