deepgram-devs / deepgram-conversational-demo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from deepgram-starters/nextjs-live-transcription

300.0 2.0 88.0 6.49 MB

Deepgram Conversational AI demo

Home Page: https://emilyai.deepgram.com

License: MIT License

JavaScript 2.00% TypeScript 95.33% CSS 2.67%

asr deepgram nextjs react stt tts vercel

deepgram-conversational-demo's Introduction

Deepgram AI Agent Technical Demo

Combine Text-to-Speech and Speech-to-Text into a conversational agent.

Project codename EmilyAI

The purpose of this demo is to showcase how you can build a Conversational AI application that engages users in natural language interactions, mimicking human conversation through natural language processing using Deepgram.

Examples of where you would see this type of application include: virtual assistants for tasks like answering queries and controlling smart devices, educational tutors for personalized learning, healthcare advisors for medical information, and entertainment chat bots for engaging conversations and games.

These applications aim to enhance user experiences by offering efficient and intuitive interactions, reducing the need for human intervention in various tasks and services.

Issue Reporting

If you have found a bug or if you have a feature request, please report them at this repository issues section. Please do not report security vulnerabilities on the public GitHub issue tracker.

Check out our KNOWN ISSUES before reporting.

Demo features

Capture streaming audio using Deepgram Streaming Speech to Text.
Natural Language responses using an OpenAI LLM.
Speech to Text conversion using Deepgram Aura Text to Speech.

What is Deepgram?

Deepgram is a foundational AI company providing speech-to-text and language understanding capabilities to make data readable and actionable by human or machines.

Sign-up to Deepgram

Want to start building using this project? Sign-up now for Deepgram and create an API key.

Quickstart

Manual

Follow these steps to get started with this starter application.

Clone the repository

Go to GitHub and clone the repository.

Install dependencies

Install the project dependencies.

npm install

Edit the config file

Copy the code from sample.env.local and create a new file called .env.local.

DEEPGRAM_STT_DOMAIN=https://api.deepgram.com
DEEPGRAM_API_KEY=YOUR-DG-API-KEY
OPENAI_API_KEY=YOUR-OPENAI-API-KEY

For DEEPGRAM_API_KEY paste in the key you generated in the Deepgram console.
Set DEEPGRAM_STT_DOMAIN to be https://api.deepgram.com.
OPENAI_API_KEY should be an OpenAI API Key that can access the chat completions API.

Run the application

Once running, you can access the application in your browser.

npm run dev

Getting Help

We love to hear from you so if you have questions, comments or find a bug in the project, let us know! You can either:

Author

Deepgram

License

This project is licensed under the MIT license. See the LICENSE file for more info.

deepgram-conversational-demo's People

Contributors

Stargazers

Watchers

Forkers

cynthia427 shershah1024 fstinio astowny ai-general aladinmemorality whatif-dev khdjn webappcreativedev nuosmagicko markscrivo tarekegnsisay yeslsudarshan majiajue yaojunru aditya-chandrabhatla cabinetbull2319 christorng wenwen-moe hippoley lianghongli2 kushmohan0 devsatish danmarauda leongxj123 christiaan-mathu choujar ricky0123 gomaticai klotzjesse domsteil serdarkaracay fastfedora lovelyrobert raivaibhav chitimbwasc infoaitek24 algife zavier-sanders jon-spaeth aqeelaqeel org-ai-github zlehman1 florinbardosi kynnyhsap butzhang carlcortright musicly-ai gunta findigital caliberexchange id-2 gotodev2018 shubham-hk2 rachmadideni anantdevcs cyu60 ramzansmith hallbergandrew gqadonis marauda-io bitnom saleskit24 edsplore msopacua bluebirdback daddythefather chatgph2024 akshaynmhc seunghyonpark viennamania hauselin natalierutgers yukiwith5267 joemccorkle thembahank gregos12 simliai codexx-host-ai emilalvaroaitekph2024 gauravkesharwani tinkvu gusai40 technology-alliance-group anjaleenaren aitek-gph-development ceoek2023 youneshidaoui

deepgram-conversational-demo's Issues

Setup OpenGraph and metadata

We have some missing OG and meta information.

unable to reconnect

What is the current behavior?

unable to re-establish connection after a period of inactivity

Steps to reproduce

open https://emilyai.deepgram.com/ and leave it dormant for like 20 minutes, then notice that the toast repeatedly pops up "The connection to Deepgram closed, we'll attempt to reconnect." and the transcription doesn't work

Expected behavior

connection re-establishes

Please tell us about your environment

this is in production, chrome and macos

Other information

I documented more information and logging here:
https://discord.com/channels/1108042150941294664/1245769351689146491/1245769351689146491

I did some debugging + logging on a forked version of this repo, and in my forked repo the loop is

connection closes due to inactivity
tries to reopen connection
connection is reopened
connection closes immediately without a reason (???)
loop restarts

Errors added to ErrorContextProvider do not display

We have not plugged ANY errors into the ErrorContextProvider yet.

Potential incomplete-is-final error occuring in app

re: emailyai; I periodically get to points in the conversation where the STT is clearly good and done and waiting, but the reponse mechanism isn’t triggered (I don’t see the STT latency thing pop up.) and so I can just sit there and wait and nothing happens until I speak again and then, it’ll probably pick up and go again. Any idea what’s going on?

using the UtteranceEnd event, treat it like a speech_final=true and flush the Utterance.

The model is reading ¡ as "exclamation"

I said: Yep. Uh, the thing is I speak Spanish, so I'm wondering if, uh, we can talk in Spanish.
Asteria
Today at 1:54 AM
¡Hola! I can understand Spanish, but I'll respond in English. How can I help you today?

"Exclamation mark" Hola "Exclamation mark" ......

She is also reading: https:// literally.

Mobile UX Tweaks

Small tweaks to the mobile user experience, like scroll position on opening keyboard, buttons and icons in the controls bar, etc.

Real real-time transcription latency

Speech to text is not working

in this code you are mentioning different api for tts and stt but deepgram offering one key for both

Mini control panel to allow for changing various settings

Add a control panel that will allow for the adjusting of settings, and that will restart the websocket or other connections when settings change where appropriate.

iOS Autoplay Issues

Seems to be a very well-known issue with iOS devices. Apple requires a user event to play audio in the browser - basically.

A possible fix is to use the launchpage click to start the app to play a zero-audio mp3 file, and change the context of that audio object when playing audio from the queue.

See: https://matt-harrison.com/posts/web-audio/

Delay after unmuting the microphone for a second time

When unmuted the microphone there is a 10-20s delay before transcriptions will continue to come back.

Possible fix is to pause/resume the microphone, and store the mediarecorder interface in state rather than destroy it.

See: https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/pause

Unknown error displayed when service tries to establish websocket connection

What is the current behavior?

pull the repo and everything runs normal. However, when I ctrl-c and exit and then try to do npm run dev again. It started to show the error. And under the hood, it keeps retrying
Once I changed my API key, it gets back to normal.

What's happening that seems wrong?
it shouldn't show error msg.

Steps to reproduce

Mentioned above

To make it faster to diagnose the root problem. Tell us how can we reproduce the bug.
mentioned above

Expected behavior

What would you expect to happen when following the steps above?

Please tell us about your environment

node --version
v20.11.0

We want to make sure the problem isn't specific to your operating system or programming language.

Operating System/Version: MAC
Language: NextJS
Browser: Chrome

Other information

Anything else we should know? (e.g. detailed explanation, stack-traces, related issues, suggestions how to fix, links for us to have context, eg. stack overflow, codepen, etc)

Add a emoji picker

Proposed changes

I just realized there should be an option to add emoji, why? emoji are the easy way to express emotions?

Context

Better UX? (I am not a Subject matter expert, here)

Possible Implementation

https://github.com/missive/emoji-mart I think this is perfect for usecase?

Other information

I would be happy to contribute

Option to deactivate barge-in

Proposed changes

Deactivate barge-in

Context

At times (e.g. Educational content), it's necessary to ensure that the queue is processed without interruptions. Currently, adding messages to the queue can cause disruptions, leading to the current audio being interrupted and restarted.

BTW Love your work Guys.

Thanks.

Better prompt injection protection

Examples of role play and phonetically explicit responses can be prompted.

Deepgram Text to Speech not speaking math equations properly

What is the current behavior?

Math equations are not being spoken properly by Deepgram Text to Speech service

Steps to reproduce

write any math equation in input

To make it faster to diagnose the root problem. Tell us how can we reproduce the bug.

Echocancellation doesn't always work in Chrome/Chromium browsers

Caused by a user not using peer-devices. Here is the ticket: https://bugs.chromium.org/p/chromium/issues/detail?id=687574 It basically says that the echo cancellation only works for audio that is coming from a peer connection. As soon as it is processed locally by the Web Audio API it will not be considered anymore by the echo cancellation.

Possible fix is to go ahead and volume-down the playback when you start speaking. This will improve the barge-in experience, and possibly duck the playback under the microphones' decibel threshold so it doesn't pick itself up.

See: https://stackoverflow.com/questions/10338704/javascript-to-detect-if-user-changes-tab

bug - pressing space while typing a message causes the microphone to toggle

pressing space while typing a message causes the microphone to toggle

UX Improvements

Proposed changes

Selecting model before initialization, give user an option to select the model?
Text area with limited rows instead of input? right now if I press shift+enter then it acts as a enter
Maintaining the state of chosen model on refresh?
Mobile view fixes: logo image is too big (width wise) for small screen, if text messages are big then it doesn't break into another line?
Performance optimization? (check the lighthouse)

Context

Few of my initial findings, while I was on my phone. First thing I noticed after login into the Deepgram that it have different model, but there is no option to choose.

Possible Implementation

Not obligatory, but suggest an idea for implementing addition or change

Other information

I would be happy to contribute.

STT sometimes hangs and needs extra words to finalise

I think we get an is_final but not a speech_final. Is this a case of tuning utterance_end_ms?

deepgram-devs / deepgram-conversational-demo Goto Github PK

deepgram-conversational-demo's Introduction

Deepgram AI Agent Technical Demo

Issue Reporting

Demo features

What is Deepgram?

Sign-up to Deepgram

Quickstart

Manual

Clone the repository

Install dependencies

Edit the config file

Run the application

Getting Help

Author

License

deepgram-conversational-demo's People

Contributors

Stargazers

Watchers

Forkers

deepgram-conversational-demo's Issues

What is the current behavior?

Steps to reproduce

Expected behavior

Please tell us about your environment

Other information

Speech to text is not working

What is the current behavior?

Steps to reproduce

Expected behavior

Please tell us about your environment

Other information

Proposed changes

Context

Possible Implementation

Other information

Proposed changes

Context

What is the current behavior?

Steps to reproduce

Proposed changes

Context

Possible Implementation

Other information

Recommend Projects

Recommend Topics

Recommend Org