Coder Social home page Coder Social logo

pipecat-ai / pipecat Goto Github PK

View Code? Open in Web Editor NEW
2.8K 26.0 202.0 55.72 MB

Open Source framework for voice and multimodal conversational AI

License: BSD 2-Clause "Simplified" License

Python 99.64% Dockerfile 0.36%
ai real-time voice voice-assistant chatbot-framework chatbots

pipecat's People

Contributors

aconchillo avatar adidoit avatar ankykong avatar cbrianhill avatar chadbailey59 avatar eddieoz avatar jamsea avatar jptaylor avatar kwindla avatar lazeratops avatar lewiswolfgang avatar moishe avatar rahulunair avatar tomtom101 avatar weedge avatar wtlow003 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pipecat's Issues

Help install daily python 3.12

The conflict is caused by:
pipecat-ai[daily,openai,silero] 0.0.36 depends on daily-python~=0.10.1; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.35 depends on daily-python~=0.10.1; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.34 depends on daily-python~=0.10.1; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.33 depends on daily-python~=0.10.1; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.32 depends on daily-python~=0.10.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.31 depends on daily-python~=0.9.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.30 depends on daily-python~=0.9.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.29 depends on daily-python~=0.9.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.28 depends on daily-python~=0.9.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.27 depends on daily-python~=0.9.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.26 depends on daily-python~=0.9.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.25 depends on daily-python~=0.9.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.24 depends on daily-python~=0.9.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.23 depends on daily-python~=0.8.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.22 depends on daily-python~=0.8.0; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.21 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.20 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.19 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.18 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.17 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.16 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.15 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.14 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.13 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.12 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.11 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.10 depends on daily-python~=0.7.4; extra == "daily"
pipecat-ai[daily,openai,silero] 0.0.9 depends on daily-python~=0.7.4; extra == "daily"

[Feature Request] Real-Time Usage Count for Each Service Provider

Description:
To enhance the development of consumer-facing applications, it would be extremely beneficial to have an out-of-the-box feature that provides a real-time counter for the usage of tokens/characters for various services such as LLMs (Language Models), TTS (Text-to-Speech), STT (Speech-to-Text), and others. This feature would enable developers to set and monitor usage limits effectively.

Benefits:

  • Allows developers to manage and control the usage of different service providers.
  • Facilitates the setting of usage limits for better resource allocation and cost management.
  • Provides real-time insights into the consumption patterns of various services.

Use Case:
As a developer building a consumer-facing application, I want to set a usage limit on the number of tokens/characters for services like LLMs, TTS, and STT. Having a real-time counter integrated into the system would allow me to track usage efficiently and ensure that my application stays within the predefined limits.

Suggested Implementation:

  • Introduce a real-time counter for each service provider.
  • Display the current usage count of tokens/characters.
  • Provide an option to set usage limits and receive notifications/alerts when limits are approaching or exceeded.

Conclusion:
Implementing a real-time usage counter for service providers would greatly improve the ability of developers to manage and optimize the use of various services, leading to more efficient and cost-effective applications.

Add service OpenAITTSService

The OpenAI Text-To-Speech sounds very natural in different languages. Is this something pipecat wants to support?

FastAPIWebSocketTransport not working when used in non Twilio settings

I am trying to implement a FastAPI server that can accept both Twilio calls & websocket calls but found that the FastAPIWebSocket transport did not drop in nicely.

I was using the examples/websocket-server client as a means to check my work and was confused why the WebSocketServerTransport worked whereas the FastAPI one did not. I noticed playback from the voice assistant was working but no inputs were getting thru.

I did some digging at the code for the transport input and found that the following monkey patched fixed it for this particular usecase:

        transport = FastAPIWebsocketTransport(
            websocket=websocket_client,
            params=FastAPIWebsocketParams(
                audio_out_enabled=True,
                add_wav_header=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(),
                vad_audio_passthrough=True,
                serializer=ProtobufFrameSerializer()
            )
        )
        async def _patched_receive_messages(self):
            async for message in self._websocket.iter_bytes():
                frame = self._params.serializer.deserialize(message)
                if not frame:
                    continue
                if isinstance(frame, AudioRawFrame):
                    await self.push_audio_frame(frame)
                else: 
                    await self._internal_push_frame(frame)
            await self._callbacks.on_client_disconnected(self._websocket)

        transport._input._receive_messages = partial(_patched_receive_messages, transport._input)

The two changes were:

  1. I had to use self._websocket.iter_bytes() instead of self._websocket.iter_text() to tease out the protobuf data
  2. I had to make use of the _internal_push_frame function that isnt used by the current implementation

This makes sense when working with the Twilio WS api but would be nice if it could be generalized to accommodate this workflow

Storybot interaction issues

I modified the llm service code to local ollama. After the program started, everything was fine at first. The robot asked me what kind of story I wanted to hear. I answered it, and then it told me the first paragraph. When it finished telling this When I was asked about my next step needs, the microphone on the page turned gray, and there was no error message in the log. The page was obviously waiting for my response, but because the microphone turned gray, I couldn't answer it. I observed that the Chrome tab shows that the microphone is running normally, I posted the log below, hoping someone can help me.

  • (venv) (base) pope@Hengs-MBP storytelling-chatbot % python src/server.py
  • INFO: Started server process [7067]
  • INFO: Waiting for application startup.
  • INFO: Application startup complete.
  • INFO: Uvicorn running on http://0.0.0.0:7860 (Press CTRL+C to quit)
  • INFO: 127.0.0.1:50611 - "GET / HTTP/1.1" 200 OK
  • INFO: 127.0.0.1:50619 - "POST /create HTTP/1.1" 200 OK
  • INFO: 127.0.0.1:50619 - "POST /start HTTP/1.1" 200 OK
  • 2024-06-06 11:02:09.055 | DEBUG | main:main:55 - Transport created for room:https://popeking.daily.co/S9xQJMm9alpJQ5nWPcSM
  • 2024-06-06 11:02:09.055 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking PipelineSource#0 -> OllamaLLMService#0
  • 2024-06-06 11:02:09.056 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking OllamaLLMService#0 -> ElevenLabsTTSService#0
  • 2024-06-06 11:02:09.056 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking ElevenLabsTTSService#0 -> DailyOutputTransport#0
  • 2024-06-06 11:02:09.056 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking DailyOutputTransport#0 -> PipelineSink#0
  • 2024-06-06 11:02:09.056 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking Source#0 -> Pipeline#0
  • 2024-06-06 11:02:09.056 | DEBUG | main:main:109 - Waiting for participant...
  • 2024-06-06 11:02:09.056 | DEBUG | pipecat.pipeline.runner:run:28 - Runner PipelineRunner#0 started running PipelineTask#0
  • 2024-06-06 11:02:09.056 | INFO | pipecat.transports.services.daily:_join:224 - Joining https://popeking.daily.co/S9xQJMm9alpJQ5nWPcSM
  • 2024-06-06 11:02:10.374 | INFO | pipecat.transports.services.daily:on_participant_joined:423 - Participant joined 89f7ab49-1166-4d7c-b044-318db56cf91e
  • 2024-06-06 11:02:10.375 | DEBUG | main:on_first_participant_joined:113 - Participant joined, storytime commence!
  • 2024-06-06 11:02:11.894 | INFO | pipecat.transports.services.daily:_handle_join_response:285 - Joined https://popeking.daily.co/S9xQJMm9alpJQ5nWPcSM
  • 2024-06-06 11:02:11.895 | INFO | pipecat.transports.services.daily:_handle_join_response:288 - Enabling transcription with settings language='en' tier='nova' model='2-conversationalai' profanity_filter=True redact=False endpointing=True punctuate=True includeRawResponse=True extra={'interim_results': True}
  • 2024-06-06 11:02:12.534 | DEBUG | pipecat.transports.services.daily:on_transcription_started:450 - Transcription started: {'language': 'en', 'startedBy': '817bd6c7-896f-4686-a824-217fb6129d26', 'tier': 'nova', 'transcriptId': '6056d8f3-d20c-478c-a906-b45d15642987', 'model': '2-conversationalai'}
  • 2024-06-06 11:02:23.432 | DEBUG | pipecat.services.elevenlabs:run_tts:35 - Generating TTS: [Welcome to my whimsy workshop! I'm thrilled to spin a tale just for you. What kind of story would you love to embark on? Would you like it to be adventurous, mysterious, romantic, or perhaps something entirely unexpected? Let me know, and we'll conjure up a fantastical journey together!]
  • 2024-06-06 11:02:25.426 | DEBUG | pipecat.pipeline.runner:run:32 - Runner PipelineRunner#0 finished running PipelineTask#0
  • 2024-06-06 11:02:25.427 | DEBUG | pipecat.transports.services.daily:init:70 - Loaded native WebRTC VAD
  • 2024-06-06 11:02:25.427 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking PipelineSource#1 -> DailyInputTransport#0
  • 2024-06-06 11:02:25.427 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking DailyInputTransport#0 -> LLMUserResponseAggregator#0
  • 2024-06-06 11:02:25.427 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking LLMUserResponseAggregator#0 -> OllamaLLMService#0
  • 2024-06-06 11:02:25.428 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking OllamaLLMService#0 -> StoryProcessor#0
  • 2024-06-06 11:02:25.428 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking StoryProcessor#0 -> StoryImageProcessor#0
  • 2024-06-06 11:02:25.428 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking StoryImageProcessor#0 -> ElevenLabsTTSService#0
  • 2024-06-06 11:02:25.428 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking ElevenLabsTTSService#0 -> DailyOutputTransport#0
  • 2024-06-06 11:02:25.428 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking DailyOutputTransport#0 -> LLMAssistantResponseAggregator#0
  • 2024-06-06 11:02:25.428 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking LLMAssistantResponseAggregator#0 -> PipelineSink#1
  • 2024-06-06 11:02:25.428 | DEBUG | pipecat.processors.frame_processor:link:37 - Linking Source#1 -> Pipeline#1
  • 2024-06-06 11:02:25.428 | DEBUG | pipecat.pipeline.runner:run:28 - Runner PipelineRunner#0 started running PipelineTask#1
  • 2024-06-06 11:02:43.758 | DEBUG | pipecat.transports.services.daily:_on_transcription_message:840 - Transcription (from: 89f7ab49-1166-4d7c-b044-318db56cf91e): [Perhaps.]
  • 2024-06-06 11:02:45.488 | DEBUG | pipecat.transports.services.daily:_on_transcription_message:840 - Transcription (from: 89f7ab49-1166-4d7c-b044-318db56cf91e): [Something like]
  • 2024-06-06 11:02:47.290 | DEBUG | pipecat.transports.services.daily:_on_transcription_message:840 - Transcription (from: 89f7ab49-1166-4d7c-b044-318db56cf91e): [mysteries.]
  • 2024-06-06 11:03:13.508 | DEBUG | pipecat.services.fal:run_image_gen:55 - Generating image from prompt: illustrative art of a tiny, curious fairy with iridescent wings, a brown fluffy dog, and a tiny red cat in a whimsical forest with sparkling fireflies. In the style of Studio Ghibli. colorful, whimsical, painterly, concept art.
  • 2024-06-06 11:03:15.136 | DEBUG | pipecat.services.fal:run_image_gen:69 - Image generated at: https://fal.media/files/koala/ghhFxFx4ZIzGwDQEgUA-r.png
  • 2024-06-06 11:03:15.136 | DEBUG | pipecat.services.fal:run_image_gen:72 - Downloading image https://fal.media/files/koala/ghhFxFx4ZIzGwDQEgUA-r.png ...
  • 2024-06-06 11:03:16.505 | DEBUG | pipecat.services.fal:run_image_gen:74 - Downloaded image https://fal.media/files/koala/ghhFxFx4ZIzGwDQEgUA-r.png
  • 2024-06-06 11:03:16.973 | DEBUG | pipecat.services.elevenlabs:run_tts:35 - Generating TTS: [In the heart of the enchanted woods, a tiny fairy named Luna befriended a playful pup and a mischievous kitty who stumbled upon a hidden glade . <a delicate, shimmering portal surrounded by vines and twinkling flowers> As they explored the clearing, they discovered a glowing portal that seemed to pulse with an otherworldly energy, beckoning them to venture forth . <a soft, ethereal mist swirling around the trio as they step through the portal> With a collective sense of wonder, Luna, the dog, and the cat stepped through the shimmering gateway, and into a realm beyond their wildest dreams . How would you like the story to continue?]
  • 2024-06-06 11:03:50.483 | DEBUG | pipecat.services.fal:run_image_gen:55 - Generating image from prompt: illustrative art of = N <= B`.<a whimsical forest with towering trees, twinkling fireflies, and a crescent moon. In the style of Studio Ghibli. colorful, whimsical, painterly, concept art.
  • 2024-06-06 11:03:51.997 | DEBUG | pipecat.services.fal:run_image_gen:69 - Image generated at: https://fal.media/files/lion/ImN8FiXw8jRjKVPrQHks0.png
  • 2024-06-06 11:03:51.997 | DEBUG | pipecat.services.fal:run_image_gen:72 - Downloading image https://fal.media/files/lion/ImN8FiXw8jRjKVPrQHks0.png ...
  • 2024-06-06 11:03:53.843 | DEBUG | pipecat.services.fal:run_image_gen:74 - Downloaded image https://fal.media/files/lion/ImN8FiXw8jRjKVPrQHks0.png
  • 2024-06-06 11:03:54.481 | DEBUG | pipecat.services.elevenlabs:run_tts:35 - Generating TTS: [You're thinking of a way to generate random numbers within a specific range. Here's how you can do it in Python: import random # Generate a random number between A and B (inclusive) A = 1 B = 100 random_number = random.randint(A, B) print(random_number) In this code, randint(A, B) generates a random integer N such that `A In the heart of this enchanted forest, a curious adventurer stumbled upon a hidden path <the adventurer: a young girl with wild curly hair, wearing a flowing white dress, holding a lantern> She had been searching for the legendary Moonflower, said to bloom only once a year under the gentle light of the crescent moon How would you like the story to continue?]

Azure LLM Exception

TL;DR

I've found an AttributeError: 'AzureLLMService' object has no attribute '_endpoint' error when attempting to use the AzureLLMService instead of the BaseOpenAILLMService in one of the provided example projects.

The error appears to be an instance of using an uninitialised attribute. The AzureLLMService constructor calls the parent BaseOpenAILLMService constructor before initialising its own AttributeError attribute, resulting in an AttributeError when the parent constructor tries to access the non-existent _endpoint

Detailed report

Environment

I'm using an M3 Pro MacBook Pro running MacOS Sonoma 14.1 (23B2073) and Python 3.11.7 using a virtual environment.

Reproduction steps

I've been trying to run the simple-chatbot example project but using the Azure OpenAI Service by replacing the provided BaseOpenAILLMService with AzureLLMService and updating the appropriate environment variables where required.

The resulting invocation (bot.py line 108) looks like:

llm = AzureLLMService(
    api_key=os.getenv("AZURE_OAI_KEY"),
    endpoint=os.getenv("AZURE_OAI_ENDPOINT"),
    model=os.getenv("AZURE_OAI_MODEL"),
)

Granted that all other required environment variables are provided and correct (they are), running the server.py as instructed by the README.md successfully runs the FastAPI server.

The actual exception is raised when accessing the /start endpoint via web browser, which launches the Daily UI but prevents the bot from joining the room.

Expected behaviour

The example project should work just fine by replacing the BaseOpenAILLMService with the AzureLLMService.

Actual behaviour

An exception is being raised:

  File "[REDACTED]/pipecat/examples/simple-chatbot/bot.py", line 108, in main
    llm = AzureLLMService(
          ^^^^^^^^^^^^^^^^
  File "[REDACTED]/pipecat/src/pipecat/services/azure.py", line 81, in __init__
    super().__init__(api_key=api_key, model=model)
  File "[REDACTED]/pipecat/src/pipecat/services/openai.py", line 63, in __init__
    self.create_client(api_key=api_key, base_url=base_url)
  File "[REDACTED]/pipecat/src/pipecat/services/azure.py", line 90, in create_client
    azure_endpoint=self._endpoint,
                   ^^^^^^^^^^^^^^
AttributeError: 'AzureLLMService' object has no attribute '_endpoint'

Probable Cause

The current implementation for the AzureLLMService is:

class AzureLLMService(BaseOpenAILLMService):
    def __init__(
            self,
            *,
            api_key,
            endpoint,
            api_version="2023-12-01-preview",
            model
    ):
        super().__init__(api_key=api_key, model=model)
        self._endpoint = endpoint
        self._api_version = api_version
        self._model: str = model

Its parent class is the BaseOpenAILLMService (abbreviated):

class BaseOpenAILLMService(LLMService):
    def __init__(self, model: str, api_key=None, base_url=None):
        super().__init__()
        self._model: str = model
        self.create_client(api_key=api_key, base_url=base_url)

The AttributeError observed when using the AzureLLMService class can be attributed to the order of attribute initialisation and method invocation in the inheritance hierarchy.

When an instance of AzureLLMService is created, the constructor (__init__ method) of AzureLLMService is invoked. The first line of this constructor calls the constructor of its parent class, BaseOpenAILLMService, using super().__init__(api_key=api_key, model=model). This invokes the parent constructor before the _endpoint attribute is initialised in the AzureLLMService constructor.

The BaseOpenAILLMService constructor, in turn, calls the create_client method (self.create_client(api_key=api_key, base_url=base_url)). Due to the principles of inheritance and polymorphism, the create_client method of the AzureLLMService class is invoked, as it overrides the create_client method of the parent class.

However, at this point, the _endpoint attribute has not yet been initialised in the AzureLLMService constructor, as the self._endpoint = endpoint line has not been executed. Therefore, when the create_client method of AzureLLMService tries to access self._endpoint, it raises an AttributeError because the _endpoint attribute does not exist.

This sequence of events violates the expected initialisation order and results in attempting to access an attribute before it has been properly initialised, leading to the AttributeError.

Recommended solution

Given the current implementation, I suggest to simply move the super().__init__(api_key=api_key, model=model) call to the very end of the AzureLLMService constructor so that all required attributes can be defined BEFORE they are required.

The following implementation works perfectly and passes the Azure LLM Integration Tests.

class AzureLLMService(BaseOpenAILLMService):
    def __init__(
            self,
            *,
            api_key,
            endpoint,
            api_version="2023-12-01-preview",
            model
    ):
        
        self._endpoint = endpoint
        self._api_version = api_version
        self._model: str = model

        super().__init__(api_key=api_key, model=model)

I'll be happy to open a PR for this, just wanted to be sure first. Please let me know if you have any questions.

[Question] React Native Integration with Pipecat and Daily

Hi Pipecat Team,

I'm exploring the integration of Pipecat (Pipeline: Speech-to-text -> LLM -> Text-to-Speech) into my React Native app using Expo. I have a couple of questions on the best approach for this integration:

  1. Daily Integration: Should Daily be integrated directly within my app's client, or would it be more efficient to deploy Pipecat to AWS (perhaps using a Lambda architecture with state management) and have the Expo app communicate with the cloud service and all the pipecat code living in aws?

  2. Best Practices: Are there any specific best practices or recommendations you have for integrating Pipecat with a React Native app, especially considering the constraints and capabilities of Expo?

Any guidance or examples you could provide would be greatly appreciated!

Thank you for your help!

Twilio-chatbot exception

I've tried running the example locally and within docker.

It always fails in the same way for me:

`
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "...uvicorn/protocols/websockets/websockets_impl.py", line 244, in run_asgi
result = await self.app(self.scope, self.asgi_receive, self.asgi_send) # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...uvicorn/middleware/proxy_headers.py", line 70, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

...

File ".../pipecat/examples/twilio-chatbot/bot.py", line 39, in run_bot
serializer=TwilioFrameSerializer(stream_sid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: TwilioFrameSerializer.init() takes 1 positional argument but 2 were given
`

How to change STT api

How to change the speech recognition interface? I want to use iFlytek's or ourself's speech recognition.
I want to have a conversation in Chinese.

Unable to hear pipeline outputs or see transcriptions in my audio pipeline

I am experiencing issues with my audio processing pipeline. When I comment out transport.output() in the pipeline, I can see the interaction logs without hearing the audio from ElevenLabs. However, if I uncomment that line, I neither hear the audio output nor see the transcription logs from Deepgram.

Here is the relevant part of my code:

import asyncio
import os
import sys

import aiohttp
from loguru import logger
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_response import (
    LLMAssistantResponseAggregator,
    LLMUserResponseAggregator,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport
from pipecat.vad.silero import SileroVADAnalyzer
from pipecat.vad.vad_analyzer import VADParams

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


class TranscriptionLogger(FrameProcessor):
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")


async def main():
    async with aiohttp.ClientSession() as session:
        transport = LocalAudioTransport(
            TransportParams(
                audio_in_enabled=True,
                audio_out_enabled=True,
                transcription_enabled=True,
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(params=VADParams(min_volume=0.6)),
                vad_audio_passthrough=True,
            )
        )

        stt = DeepgramSTTService(os.environ["DEEPGRAM_API_KEY"])

        llm = OpenAILLMService(
            api_key=os.environ["OPENAI_API_KEY"],
            model="gpt-3.5-turbo-0125",
        )

        messages = [
            {
                "role": "system",
                "content": "Say hello.",
            },
        ]

        tma_in = LLMUserResponseAggregator(messages)
        tma_out = LLMAssistantResponseAggregator(messages)

        tts = ElevenLabsTTSService(
            aiohttp_session=session,
            api_key=os.environ["ELEVENLABS_API_KEY"],
            voice_id=os.environ["ELEVENLABS_VOICE_ID"],
        )

        pipeline = Pipeline(
            [
                transport.input(),
                stt,
                tma_in,
                llm,
                tts,
                transport.output(),
                tma_out,
            ]
        )

        task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))

        runner = PipelineRunner()

        await runner.run(task)


if __name__ == "__main__":
    asyncio.run(main())

Additionally, when I add an introductory message from the LLM, I can hear the initial message, but the interaction with the LLM stops. Here is the code snippet for that:

    async def say_something():
            messages.append(
                {
                    "role": "system",
                    "content": "Please briefly introduce yourself to the user.",
                }
            )
            await task.queue_frames([LLMMessagesFrame(messages)])

        # await runner.run(task)
        await asyncio.gather(runner.run(task), say_something())

I am using a Mac and pipecat-ai==0.0.31. Do you see any obvious issues in my code, or could the problem be elsewhere?

[BUG] Silero expects different number of samples

This error just started appearing this afternoon, on the latest version of the code, didn't update any libraries, same daily room. Don't know what could have caused this


2024-06-30 05:51:24.814 | ERROR    | pipecat.vad.silero:voice_confidence:74 - Error analyzing audio with Silero VAD: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/vad/model/vad_annotator.py", line 26, in forward
    if _2:
      _3 = torch.format(_0, (torch.size(x0))[-1])
      ops.prim.RaiseException(_3, "builtins.ValueError")
      ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    else:
      pass

Traceback of TorchScript, original code (most recent call last):
  File "/home/keras/notebook/nvme1/adamnsandle/silero-models-research/vad/model/vad_annotator.py", line 484, in forward
        num_samples = 512 if sr == 16000 else 256
        if x.shape[-1] != num_samples:
            raise ValueError(f"Provided number of samples is {x.shape[-1]} (Supported values: 256 for 8000 sample rate, 512 for 16000)")
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    
        batch_size = x.shape[0]
builtins.ValueError: Provided number of samples is 640 (Supported values: 256 for 8000 sample rate, 512 for 16000)

VAD usage issue, provided sample is incorrect

Hi there, I'm getting this error when using the VAD module:
Provided number of samples is 320 (Supported values: 256 for 8000 sample rate, 512 for 16000)

here is my little demo code. The sample rate is 8000 with 1 channel

    wav = wave.open(r'demo-instruct.wav', "rb")
    total_frames = wav.getnframes()
    sample_rate = wav.getframerate()
    num_channels = wav.getnchannels()
    print(sample_rate)
    print(num_channels)

    vad_analyzer = SileroVADAnalyzer(
        sample_rate=sample_rate, params=VADParams())

    sent_frames = 0

    while sent_frames < total_frames:
        # Read 100ms worth of audio frames.
        frames = wav.readframes(128)
        audio_data = np.frombuffer(frames, dtype=np.int16)

        # Convert back to bytes
        frames_to_send = audio_data.tobytes()
        if len(frames_to_send) > 0:
            sent_frames += len(frames_to_send)
            new_vad_state = vad_analyzer.analyze_audio(frames_to_send)
            print(new_vad_state)

Any ideas on this? The program itself doesn't raise an error...it is something inside silero itself that is printing the error but not actually raising:

Traceback of TorchScript, original code (most recent call last):
  File "/home/keras/notebook/nvme1/adamnsandle/silero-models-research/vad/model/vad_annotator.py", line 484, in forward
        num_samples = 512 if sr == 16000 else 256
        if x.shape[-1] != num_samples:
            raise ValueError(f"Provided number of samples is {x.shape[-1]} (Supported values: 256 for 8000 sample rate, 512 for 16000)")
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    
        batch_size = x.shape[0]
builtins.ValueError: Provided number of samples is 320 (Supported values: 256 for 8000 sample rate, 512 for 16000)

UnknownCallClientError + 11Labs Error

Trying to run the chatbot.py, and encountered the following errors. I think there are two errors:

  1. daily core error on unknown call client
  2. 11labs error. I have a free account with 11labs but I have not used any of the free credits. so it should work. I can probably debug the 11labs error separately, and it is likely not super relevant to daily.

I have populated the env variables. And here is the shape of my env:

OPENAI_API_KEY=sk-xxxx                                                                                                 
ELEVENLABS_API_KEY=xxx
ELEVENLABS_VOICE_ID=CnV6BQOHeZCIv4McSXDH
DAILY_SAMPLE_ROOM_URL=https://xx.daily.co/xx-x
DAILY_API_KEY=xxxx

Detailed error

(smol) sasha@iSashair daily-ai-sdk % python src/examples/starter-apps/chatbot.py             
Using cache found in /Users/sasha/.cache/torch/hub/snakers4_silero-vad_master
Eleven Labs API Key: xxx
Eleven Labs Voice ID: xxx
20 2024-03-16 14:30:07,548 🎬 Starting frame consumer thread
{"timestamp":"2024-03-16T21:30:07.577665Z","level":"ERROR","fields":{"message":"startTranscription (request 3) encountered an error: Transcription(Properties(UnknownCallClientError))"},"target":"daily_core::native::ffi::call_client"}
40 2024-03-16 14:30:07,578 on_error: Error handling startTranscription: Transcription(Properties(UnknownCallClientError))
!!! in here, pipeline.source is <Queue maxsize=0 _getters[1]>
10 2024-03-16 14:30:17,075 Generating chat via openai: [{"content": "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself.", "role": "system", "name": "system"}]
[W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.
20 2024-03-16 14:30:18,795 === OpenAI LLM TTFB: 1.7094330787658691
40 2024-03-16 14:30:19,559 audio fetch status code: 401, error: <bound method ClientResponse.text of <ClientResponse(https://api.elevenlabs.io/v1/text-to-speech/CnV6BQOHeZCIv4McSXDH/stream?output_format=pcm_16000&optimize_streaming_latency=2) [401 Unauthorized]>
<CIMultiDictProxy('Date': 'Sat, 16 Mar 2024 21:30:18 GMT', 'Server': 'uvicorn', 'Content-Length': '476', 'Content-Type': 'application/json', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Headers': '*', 'Access-Control-Allow-Methods': 'POST, OPTIONS, DELETE, GET, PUT', 'Access-Control-Max-Age': '600', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000')>
>

Storytelling Chatbot client crashes with client-side exception

Storytelling Chatbot client sometime reports the following error on opening http://0.0.0.0:7860:

Application error: a client-side exception has occurred (see the browser console for more information).

Here is the browser console log:

TypeError: Cannot read properties of undefined (reading 'ondevicechange')
    at nx (d50e61c5-729bf43d5cce1859.js:19:31033)
    at Z.startListeningForDeviceChanges (d50e61c5-729bf43d5cce1859.js:19:42326)
    at new Z (d50e61c5-729bf43d5cce1859.js:19:49895)
    at Function.value (d50e61c5-729bf43d5cce1859.js:19:118289)
    at 647-07448eb149f02d56.js:1:32531
    at aI (fd9d1056-15205bcf7b0e6812.js:1:72882)
    at a3 (fd9d1056-15205bcf7b0e6812.js:1:84323)
    at a5 (fd9d1056-15205bcf7b0e6812.js:1:84961)
    at a8 (fd9d1056-15205bcf7b0e6812.js:1:84845)
    at a5 (fd9d1056-15205bcf7b0e6812.js:1:84941)

In the cases I don't get the above error, I get the following message:

This demo is currently at capacity. Please try again later.

To summarize, this demo doesn't work for me, or I don't know how to run it.

Implement Google Gemini LLM service

I'm working on a Google Gemini LLM service for Pipecat and interested in any feedback people have about the LLMMessagesFrame class.

All the other LLMs with a chat (multi-turn) fine-tuning that I've worked with have adopted OpenAI's messages array format. Google's format is a bit different.

  • The role can only be user or model. Contrast with user, assistant, or system for OpenAI.
  • The message content shape is parts: [<string>, ...] instead of just content: <string>.
  • Inline image data is also typed differently.

https://ai.google.dev/api/python/google/ai/generativelanguage/Content

https://ai.google.dev/gemini-api/docs/get-started/tutorial?lang=python#encode_messages

We could do at least three different things.

  1. Implement the Gemini service so that it translates internally from the OpenAI data shape used by LLMMessage into the google.ai.generativelanguage data structures.
  2. Implement a new LLMMessage class/subclass for use with Gemini models.
  3. Design an abstraction that can represent higher-level concepts and that all of our LLM services will use.

I lean towards (1). I think it will be fairly straightforward and we can always do (2) later if we need to.

But I haven't yet gotten to the context management code here, and that may complicate things. Note: we can't use the Google library's convenience functions for multi-turn chat context management, because pipelines need to be interruptible. One important part of interruptibility is making sure that the LLM context includes only sentences that the LLM has "said" out loud to the user.

Any other thoughts here are welcome!

Also, Discord thread is here if people want to hash things out ephemerally before etching pixels into the stone tablet of an issue comment: https://discord.com/channels/1239284677165056021/1239284677823565826/1240682255584854027

Add Google TTS service

Google text to speech seems to have good support in a lot of different languages. Is adding this to pipecat a good idea?

[Feature Request] Function Calling Integration with Google Gemini

Description:
To enhance the capabilities and interactivity of our application, it would be highly beneficial to integrate function calling with the Google Gemini Chatbot. This feature would enable the chatbot to invoke predefined functions based on user inputs and context, allowing for more dynamic and functional conversations According to the Gorilla leaderboard, this would provide performance much better than GPT-3.5 and at a lower cost than GPT-4.

I attempted this integration myself but was unsuccessful:

#
# Copyright (c) 2024, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#

import asyncio
import json
from typing import Callable, List

from loguru import logger

from pipecat.frames.frames import (
    Frame,
    LLMFullResponseEndFrame,
    LLMFullResponseStartFrame,
    LLMMessagesFrame,
    LLMResponseEndFrame,
    LLMResponseStartFrame,
    TextFrame,
    VisionImageRawFrame,
)
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext, OpenAILLMContextFrame
from pipecat.processors.frame_processor import FrameDirection
from pipecat.services.ai_services import LLMService

try:
    import google.ai.generativelanguage as glm
    import google.generativeai as gai
except ModuleNotFoundError as e:
    logger.error(f"Exception: {e}")
    logger.error(
        "In order to use Google AI, you need to `pip install pipecat-ai[google]`. Also, set `GOOGLE_API_KEY` environment variable."
    )
    raise Exception(f"Missing module: {e}")


class GoogleLLMService(LLMService):
    """This class implements inference with Google's AI models

    This service translates internally from OpenAILLMContext to the messages format
    expected by the Google AI model. We are using the OpenAILLMContext as a lingua
    franca for all LLM services, so that it is easy to switch between different LLMs.
    """

    def __init__(self, api_key: str, model: str = "gemini-1.5-flash-latest", tools: List[Callable] = None, **kwargs):
        super().__init__(**kwargs)
        gai.configure(api_key=api_key)
        self._tools = tools or []
        self._client = gai.GenerativeModel(model, tools=self._tools)

    def can_generate_metrics(self) -> bool:
        return True

    def _get_messages_from_openai_context(self, context: OpenAILLMContext) -> List[glm.Content]:
        openai_messages = context.get_messages()
        google_messages = []

        for message in openai_messages:
            role = message["role"]
            content = message["content"]
            if role == "system":
                role = "user"
            elif role == "assistant":
                role = "model"

            parts = [glm.Part(text=content)]
            if "mime_type" in message:
                parts.append(
                    glm.Part(inline_data=glm.Blob(mime_type=message["mime_type"], data=message["data"].getvalue()))
                )
            google_messages.append({"role": role, "parts": parts})

        return google_messages

    async def _async_generator_wrapper(self, sync_generator):
        for item in sync_generator:
            yield item
            await asyncio.sleep(0)

    async def _process_context(self, context: OpenAILLMContext):
        await self.push_frame(LLMFullResponseStartFrame())
        try:
            logger.debug(f"Generating chat: {context.get_messages_json()}")

            messages = self._get_messages_from_openai_context(context)

            await self.start_ttfb_metrics()

            response = self._client.generate_content(
                messages, generation_config=gai.GenerationConfig(temperature=0), tools=context.tools, stream=True
            )

            await self.stop_ttfb_metrics()

            async for chunk in self._async_generator_wrapper(response):
                try:
                    for candidate in chunk.candidates:
                        if candidate.content:
                            if candidate.content.parts:
                                for part in candidate.content.parts:
                                    if part.text:
                                        await self.push_frame(LLMResponseStartFrame())
                                        await self.push_frame(TextFrame(part.text))
                                        await self.push_frame(LLMResponseEndFrame())
                        if candidate.function_call:
                            for function_call in candidate.function_call:
                                function_name = function_call.name
                                arguments = function_call.args
                                result = await self.call_function(function_name, arguments)

                                # Add function call and result to context
                                context.add_message({"role": "assistant", "content": f"Function call: {function_name}"})
                                context.add_message(
                                    {"role": "function", "name": function_name, "content": json.dumps(result)}
                                )

                                # Send function response back to the model
                                function_response = glm.Content(
                                    parts=[
                                        glm.FunctionResponse(
                                            name=function_name, response={"content": json.dumps(result)}
                                        )
                                    ]
                                )

                                # Re-process context with function result
                                context.add_message(function_response)
                                await self._process_context(context)
                                return

                except Exception as e:
                    # Google LLMs seem to flag safety issues a lot!
                    if chunk.candidates[0].finish_reason == 3:
                        logger.debug(f"LLM refused to generate content for safety reasons - {messages}.")
                    else:
                        logger.error(f"{self} error: {e}")

        except Exception as e:
            logger.error(f"{self} exception: {e}")
        finally:
            await self.push_frame(LLMFullResponseEndFrame())

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

        context = None

        if isinstance(frame, OpenAILLMContextFrame):
            context: OpenAILLMContext = frame.context
        elif isinstance(frame, LLMMessagesFrame):
            context = OpenAILLMContext.from_messages(frame.messages)
        elif isinstance(frame, VisionImageRawFrame):
            context = OpenAILLMContext.from_image_frame(frame)
        else:
            await self.push_frame(frame, direction)

        if context:
            await self._process_context(context)

daily transport ChannelNotOpen error when running code on cloud platform

when i run example simple-bot on cloud machine, i got these error
2024-07-11 10:54:32.151 | ERROR | pipecat.transports.services.daily:join:280 - Time out joining https://autoark.daily.co/1r4by424NvuhJS8hxxxx
{"timestamp":"2024-07-11T02:54:44.715005Z","level":"ERROR","fields":{"message":"Failed to fetch room information: GET failed: Transport(Transport { kind: ConnectionFailed, message: Some("Connect error"), url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("gs.daily.co")), port: None, path: "/rooms/check/autoark/1r4by424NvuhJS8hOpRF", query: None, fragment: None }), source: Some(Error { kind: TimedOut, message: "connection timed out" }) })"},"target":"daily_core::call_client"}
{"timestamp":"2024-07-11T02:54:44.715073Z","level":"ERROR","fields":{"message":"Failed to fetch room information: GET failed: Transport(Transport { kind: ConnectionFailed, message: Some("Connect error"), url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("gs.daily.co")), port: None, path: "/rooms/check/autoark/1r4by424NvuhJS8hOpRF", query: None, fragment: None }), source: Some(Error { kind: TimedOut, message: "connection timed out" }) })"},"target":"daily_core::event"}
{"timestamp":"2024-07-11T02:54:44.734809Z","level":"ERROR","fields":{"message":"join (request 2) encountered an error: Connection(Api(RoomLookup(RoomInfoFetchFailed(Get(Transport(Transport { kind: ConnectionFailed, message: Some("Connect error"), url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("gs.daily.co")), port: None, path: "/rooms/check/autoark/1r4by424NvuhJS8hxxxx", query: None, fragment: None }), source: Some(Error { kind: TimedOut, message: "connection timed out" }) }))))))"},"target":"daily_core::native::ffi::call_client"}
{"timestamp":"2024-07-11T02:54:44.734919Z","level":"ERROR","fields":{"message":"sendAppMessage (request 3) encountered an error: AppMessage(Signalling(ChannelNotOpen))"},"target":"daily_core::native::ffi::call_client"}

Deepgram STT

Any plans to add Deepgram Speech to Text, performs better than whisper in many cases.

About error of deploying the twilio chatbot

Hi There, I came across an error after all deployments were in place and it seems that I need a full Twilio account instead of a trial one.

Is that correct to fully use this chatbot powered by LLMs?

Thanks.

Simpler context management for LLMs

As mentioned in Discord, Pipecat bots don't automatically maintain an LLM context object for you; instead, you have to use a bunch of aggregators and other tools to manage that yourself.

We originally built that because there were plenty of use cases where you didn't necessarily want to automatically store everything the user or bot said in the context, but we should consider adding an 'easy mode' that does this for you.

[Feature request] Groq transcriptions support

Here are the docs for the new Whisper models available in Groq Cloud. I'd like to replace my Deepgram STT provider with the Groq transcription service. It would be great if you could add this integration. 🙏

Deployment examples: What do you want to see?

We want to build out some code and examples to make it easier to deploy Pipecat bots. Right now, we're looking at the following services:

  • Hugging Face Spaces
  • runpod.io
  • fly.io
  • Heroku
  • Cloudflare Workers

Right now, the basic structure looks like a Dockerfile running a FastAPI server that serves a static HTML frontend UI. That server also spins up bots on demand in threads. We're still figuring out how to handle Cloudflare Workers.

Where else do you want to be able to run Pipecat bots? Is there another approach that makes more sense to you?

Question: Storytelling example: Why toggle setLocalAudio() every turn?

First up, brilliant project! Still trying to wrap my head around the concept, but making some baby steps.

My question is: Why do we have to toggle recording when at the same time the "interruptible" feature is demoed in other examples?

if (e.data?.cue === "user_turn") {
  // Delay enabling local mic input to avoid feedback from LLM
  setTimeout(() => daily.setLocalAudio(true), 500);
  setStoryState("user");
} else {
  daily.setLocalAudio(false);
  setStoryState("assistant");
}

Is the "always listening" feature a unique one of the Daily default video call screen which does not apply to custom apps? Would it record the audio of the TTS output (which is a typical problem of any speech-to-speech systems I have written myself)?

Thanks for clarification, looking forward to v0.0.24 :)

Multiple function calls openai

in the file services.openai.py, why is line 147 function_name += tool_call.function.name and not function_name = tool_call.function.name?

In particular, i am getting the following error:
Uncaught exception in LLMUserContextAggregator#0: The LLM tried to call a function named 'collect_namecollect_agecollect_payment_method', but there isn't a callback registered for that function.

it happens whenever i execute a function, for example, giving my full name, then the flow goes on to collecting my age, and i try to correct my full name saying "oh, i am sorry, my name is not {previous}. it is {new_name}". I have noticed that in the patient-intake example, every function call modifies the context and just allows one function tool. Can i keep a context with many function tools (so as to call them more than once during the flow of events).??

GPT function calling

Re-enable function calling in GPTs, and add a foundational example showing how it works.

import error

______________________________________ ERROR collecting src/pipecat/pipeline/merge_pipeline.py ______________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/pipeline/merge_pipeline.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/pipeline/merge_pipeline.py:2: in
from pipecat.pipeline.frames import EndFrame, EndPipeFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.frames'
_____________________________ ERROR collecting src/pipecat/processors/aggregators/openai_llm_context.py _____________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/processors/aggregators/openai_llm_context.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/processors/aggregators/openai_llm_context.py:13: in
from openai._types import NOT_GIVEN, NotGiven
E ModuleNotFoundError: No module named 'openai'
_______________________________ ERROR collecting src/pipecat/serializers/abstract_frame_serializer.py _______________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/serializers/abstract_frame_serializer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/serializers/abstract_frame_serializer.py:3: in
from pipecat.pipeline.frames import Frame
E ModuleNotFoundError: No module named 'pipecat.pipeline.frames'
__________________________________ ERROR collecting src/pipecat/serializers/protobuf_serializer.py __________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/serializers/protobuf_serializer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/serializers/protobuf_serializer.py:3: in
from pipecat.pipeline.frames import AudioFrame, Frame, TextFrame, TranscriptionFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.frames'
__________________________________________ ERROR collecting src/pipecat/services/azure.py ___________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/services/azure.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/services/azure.py:14: in
from openai import AsyncAzureOpenAI
E ModuleNotFoundError: No module named 'openai'
________________________________________ ERROR collecting src/pipecat/services/fireworks.py _________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/services/fireworks.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/services/fireworks.py:7: in
from pipecat.services.openai import BaseOpenAILLMService
src/pipecat/services/openai.py:25: in
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext, OpenAILLMContextFrame
src/pipecat/processors/aggregators/openai_llm_context.py:13: in
from openai._types import NOT_GIVEN, NotGiven
E ModuleNotFoundError: No module named 'openai'
__________________________________________ ERROR collecting src/pipecat/services/ollama.py __________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/services/ollama.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/services/ollama.py:7: in
from pipecat.services.openai import BaseOpenAILLMService
src/pipecat/services/openai.py:25: in
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext, OpenAILLMContextFrame
src/pipecat/processors/aggregators/openai_llm_context.py:13: in
from openai._types import NOT_GIVEN, NotGiven
E ModuleNotFoundError: No module named 'openai'
__________________________________________ ERROR collecting src/pipecat/services/openai.py __________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/services/openai.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/services/openai.py:25: in
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext, OpenAILLMContextFrame
src/pipecat/processors/aggregators/openai_llm_context.py:13: in
from openai._types import NOT_GIVEN, NotGiven
E ModuleNotFoundError: No module named 'openai'
_____________________________________ ERROR collecting src/pipecat/transports/services/daily.py _____________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/transports/services/daily.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
src/pipecat/transports/services/daily.py:18: in
from daily import (
E ImportError: cannot import name 'CallClient' from partially initialized module 'daily' (most likely due to a circular import) (/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/src/pipecat/transports/services/daily.py)
____________________________________ ERROR collecting tests/integration/integration_azure_llm.py ____________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/integration/integration_azure_llm.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/integration_azure_llm.py:3: in
from pipecat.pipeline.openai_frames import OpenAILLMContextFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.openai_frames'
___________________________________ ERROR collecting tests/integration/integration_ollama_llm.py ____________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/integration/integration_ollama_llm.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/integration_ollama_llm.py:2: in
from pipecat.pipeline.openai_frames import OpenAILLMContextFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.openai_frames'
___________________________________ ERROR collecting tests/integration/integration_openai_llm.py ____________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/integration/integration_openai_llm.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/integration/integration_openai_llm.py:3: in
from pipecat.pipeline.openai_frames import OpenAILLMContextFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.openai_frames'
____________________________________________ ERROR collecting tests/test_aggregators.py _____________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_aggregators.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_aggregators.py:6: in
from pipecat.pipeline.aggregators import (
E ModuleNotFoundError: No module named 'pipecat.pipeline.aggregators'
____________________________________________ ERROR collecting tests/test_aggregators.py _____________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_aggregators.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_aggregators.py:6: in
from pipecat.pipeline.aggregators import (
E ModuleNotFoundError: No module named 'pipecat.pipeline.aggregators'
____________________________________________ ERROR collecting tests/test_ai_services.py _____________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_ai_services.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_ai_services.py:6: in
from pipecat.pipeline.frames import EndFrame, Frame, TextFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.frames'
____________________________________________ ERROR collecting tests/test_ai_services.py _____________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_ai_services.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_ai_services.py:6: in
from pipecat.pipeline.frames import EndFrame, Frame, TextFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.frames'
______________________________________________ ERROR collecting tests/test_pipeline.py ______________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_pipeline.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_pipeline.py:5: in
from pipecat.pipeline.aggregators import SentenceAggregator, StatelessTextTransformer
E ModuleNotFoundError: No module named 'pipecat.pipeline.aggregators'
______________________________________________ ERROR collecting tests/test_pipeline.py ______________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_pipeline.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_pipeline.py:5: in
from pipecat.pipeline.aggregators import SentenceAggregator, StatelessTextTransformer
E ModuleNotFoundError: No module named 'pipecat.pipeline.aggregators'
________________________________________ ERROR collecting tests/test_protobuf_serializer.py _________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_protobuf_serializer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_protobuf_serializer.py:3: in
from pipecat.pipeline.frames import AudioFrame, TextFrame, TranscriptionFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.frames'
________________________________________ ERROR collecting tests/test_protobuf_serializer.py _________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_protobuf_serializer.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_protobuf_serializer.py:3: in
from pipecat.pipeline.frames import AudioFrame, TextFrame, TranscriptionFrame
E ModuleNotFoundError: No module named 'pipecat.pipeline.frames'
________________________________________ ERROR collecting tests/test_websocket_transport.py _________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_websocket_transport.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_websocket_transport.py:7: in
from pipecat.transports.websocket_transport import WebSocketFrameProcessor, WebsocketTransport
E ModuleNotFoundError: No module named 'pipecat.transports.websocket_transport'
________________________________________ ERROR collecting tests/test_websocket_transport.py _________________________________________
ImportError while importing test module '/mnt/c/Users/DELL/Documents/pipchatbot/pipecat/tests/test_websocket_transport.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/loop/miniconda3/lib/python3.11/importlib/init.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_websocket_transport.py:7: in
from pipecat.transports.websocket_transport import WebSocketFrameProcessor, WebsocketTransport
E ModuleNotFoundError: No module named 'pipecat.transports.websocket_transport'
====================================================== short test summary info ======================================================
ERROR src/pipecat/pipeline/merge_pipeline.py
ERROR src/pipecat/processors/aggregators/openai_llm_context.py
ERROR src/pipecat/serializers/abstract_frame_serializer.py
ERROR src/pipecat/serializers/protobuf_serializer.py
ERROR src/pipecat/services/azure.py
ERROR src/pipecat/services/fireworks.py
ERROR src/pipecat/services/ollama.py
ERROR src/pipecat/services/openai.py
ERROR src/pipecat/transports/services/daily.py
ERROR tests/integration/integration_azure_llm.py
ERROR tests/integration/integration_ollama_llm.py
ERROR tests/integration/integration_openai_llm.py
ERROR tests/test_aggregators.py
ERROR tests/test_aggregators.py
ERROR tests/test_ai_services.py
ERROR tests/test_ai_services.py
ERROR tests/test_pipeline.py
ERROR tests/test_pipeline.py
ERROR tests/test_protobuf_serializer.py
ERROR tests/test_protobuf_serializer.py
ERROR tests/test_websocket_transport.py
ERROR tests/test_websocket_transport.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 22 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

class DailyTransport(BaseTransport) How to enable the bot to screen share or send a chat message while we are talking to the user

Hello.
A question has been revolving in my mind.I wanted to display the contents that are getting collected by the bot(The information it is gathering while talking to the user).Is there any way to show the file through screen sharing or chat messages?

Screen share is preferable as we want to display images or pdf.But we can also adjust to the chat part where we can share a simple json.

13 - Whisper example fails - 'resource_tracker: There appear to be %d '

I'm running 13-whisper-transcription.py.

I made one modification to it, adding:
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

To overcome this error:
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Now, I see this cryptic error:

20 2024-03-19 00:23:29,580 🎬 Starting frame consumer thread
TRANSCRIPTION
20 2024-03-19 00:23:29,581 Call_joined: {'participants': ..... [DETAILS OF THE CALL]

zsh: abort python 13-whisper-transcription.py -u -k

/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

cannot import LLMFullResponseEndFrame

when trying v0.0.18, this error is given on loading the story page

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/jwee/projects/pipecat/examples/storytelling-chatbot/src/bot.py", line 18, in <module>
    from processors import StoryProcessor, StoryImageProcessor
  File "/home/jwee/projects/pipecat/examples/storytelling-chatbot/src/processors.py", line 5, in <module>
    from pipecat.frames.frames import (
ImportError: cannot import name 'LLMFullResponseEndFrame' from 'pipecat.frames.frames' (/home/jwee/envs/alfabert/lib/python3.10/site-packages/pipecat/frames/frames.py)

example storytelling-chatbot not starting input user

After following the readme ode the storytelling chatbot example, I see the narrator is starting the story. But the fragment if the user input is not started. The microphone does not become active.

I see in the readme, Deepgram mentioned. But I can not find it being used in the code. I want to help improving the example, but I am stuck. Is daily.co supposed to stream/transcribe user input?

Someone knows how to take next step?

TypeError: BaseTransportService.run() takes 1 positional argument but 2 were given

I'm running 01-say-one-thing.py - having made no modifications to it, just passing it a URL and token. I see this cryptic error:

File "/Users/ddd/daily-ai-sdk/src/examples/foundational/01-say-one-thing.py", line 51, in
asyncio.run(main(url))
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/ddd/daily-ai-sdk/src/examples/foundational/01-say-one-thing.py", line 45, in main
await transport.run(pipeline)
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: BaseTransportService.run() takes 1 positional argument but 2 were given

add dotenv to examples

Most of the examples, for instance 06a-image-sync.py, need dotenv imported to grab the variables from the .env file

from dotenv import load_dotenv
load_dotenv()

without it, you get the following non-str key serialization error:

foundational$ python 06a-image-sync.py -u "[MY ROOM URL]" -k "[MY KEY]"
Using cache found in /Users/ddd/.cache/torch/hub/snakers4_silero-vad_master

20 2024-03-19 00:37:03,467 🎬 Starting frame consumer thread
--- Logging error ---
Traceback (most recent call last):
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/site-packages/dailyai/services/ai_services.py", line 54, in run
async for output_frame in self.process_frame(frame):
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/site-packages/dailyai/services/ai_services.py", line 115, in process_frame
async for audio_chunk in self.run_tts(text):
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/site-packages/dailyai/services/elevenlabs_ai_service.py", line 34, in run_tts
async with self._aiohttp_session.post(
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/site-packages/aiohttp/client.py", line 1187, in aenter
self._resp = await self._coro
^^^^^^^^^^^^^^^^
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/site-packages/aiohttp/client.py", line 599, in _request
resp = await req.send(conn)
^^^^^^^^^^^^^^^^^^^^
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 712, in send
await writer.write_headers(status_line, self.headers)
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/site-packages/aiohttp/http_writer.py", line 129, in write_headers
buf = _serialize_headers(status_line, headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "aiohttp/_http_writer.pyx", line 132, in aiohttp._http_writer._serialize_headers
File "aiohttp/_http_writer.pyx", line 109, in aiohttp._http_writer.to_str
TypeError: Cannot serialize non-str key None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/logging/init.py", line 1110, in emit
msg = self.format(record)
^^^^^^^^^^^^^^^^^^^
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/logging/init.py", line 953, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/logging/init.py", line 687, in format
record.message = record.getMessage()
^^^^^^^^^^^^^^^^^^^
File "/Users/ddd/opt/anaconda3/envs/py311/lib/python3.11/logging/init.py", line 377, in getMessage
msg = msg % self.args
~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting

storybot does not work

I run the storybot example ,but it does not work.So I try on the line at https://storytelling-chatbot.fly.dev/ ,aslo.
image
no more info from log :

2024-07-03 11:11:32.787 | INFO     | pipecat.transports.services.daily:join:268 - Enabling transcription with settings language='en' tier='nova' model='2-conversationalai' profanity_filter=True redact=False endpointing=True punctuate=True includeRawResponse=True extra={'interim_results': True}
2024-07-03 11:11:33.110 | DEBUG    | pipecat.transports.services.daily:on_transcription_started:482 - Transcription started: {'model': '2-conversationalai', 'transcriptId': 'aae9d1ec-c552-4d2c-a542-056674890de4', 'startedBy': '9f6a5f95-d0d4-4c96-b3c7-9f1006fe0d52', 'tier': 'nova', 'language': 'en'}
2024-07-03 11:11:33.243 | INFO     | pipecat.transports.services.daily:on_participant_joined:455 - Participant joined 6edd95bd-d4f4-4272-ab5c-f482247d2337
2024-07-03 11:11:33.244 | DEBUG    | __main__:on_first_participant_joined:111 - Participant joined, storytime commence!
2024-07-03 11:11:33.244 | DEBUG    | pipecat.services.openai:_stream_chat_completions:96 - Generating chat: [{"content": "You are a creative storyteller who loves to tell whimsical, fantastical stories.         Your goal is to craft an engaging and fun story.         Start by asking the user what kind of story they'd like to hear. Don't provide any examples.         Keep your response to only a few sentences.", "role": "system", "name": "system"}]
/{path_name:path}
INFO:     10.16.39.2:42872 - "GET /alpha-mask.gif HTTP/1.1" 200 OK
2024-07-03 11:11:34.361 | DEBUG    | pipecat.services.openai:run_tts:313 - Generating TTS: [, I am ready to begin. What type of story would you like to hear?]
2024-07-03 11:11:36.942 | DEBUG    | pipecat.pipeline.runner:run:32 - Runner PipelineRunner#0 finished running PipelineTask#0
2024-07-03 11:11:36.942 | DEBUG    | pipecat.transports.services.daily:__init__:70 - Loaded native WebRTC VAD
2024-07-03 11:11:36.942 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking PipelineSource#1 -> DailyInputTransport#0
2024-07-03 11:11:36.942 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking DailyInputTransport#0 -> LLMUserResponseAggregator#0
2024-07-03 11:11:36.942 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking LLMUserResponseAggregator#0 -> OpenAILLMService#0
2024-07-03 11:11:36.942 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking OpenAILLMService#0 -> StoryProcessor#0
2024-07-03 11:11:36.942 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking StoryProcessor#0 -> StoryImageProcessor#0
2024-07-03 11:11:36.942 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking StoryImageProcessor#0 -> OpenAITTSService#0
2024-07-03 11:11:36.942 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking OpenAITTSService#0 -> DailyOutputTransport#0
2024-07-03 11:11:36.943 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking DailyOutputTransport#0 -> LLMAssistantResponseAggregator#0
2024-07-03 11:11:36.943 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking LLMAssistantResponseAggregator#0 -> PipelineSink#1
2024-07-03 11:11:36.943 | DEBUG    | pipecat.processors.frame_processor:link:132 - Linking Source#1 -> Pipeline#1
2024-07-03 11:11:36.943 | DEBUG    | pipecat.pipeline.runner:run:28 - Runner PipelineRunner#0 started running PipelineTask#1
2024-07-03 11:11:45.274 | DEBUG    | pipecat.transports.services.daily:_on_transcription_message:879 - Transcription (from: 6edd95bd-d4f4-4272-ab5c-f482247d2337): [Fanny 

[Question] Example request. LocalAudioTransport + Whisper + llm + tts

Hi 👋

I am having trouble running a local example that integrates LocalAudioTransport, WhisperSTTService, ElevenLabsTTSService, and OpenAILLMService.

I have successfully managed to run Whisper locally for transcription and another script that uses Eleven Labs and OpenAI for TTS and LLM services, respectively. However, I am struggling to combine these components to create a fully functional local conversation system.

To illustrate, here are the two examples I have working independently:

Example 1: Passing an LLM message to the TTS provider:

import asyncio
import os
import sys

import aiohttp
from loguru import logger
from pipecat.frames.frames import EndFrame, LLMMessagesFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.services.openai import OpenAILLMService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


async def main():
    async with aiohttp.ClientSession() as session:
        transport = LocalAudioTransport(TransportParams(audio_out_enabled=True))

        tts = ElevenLabsTTSService(
            aiohttp_session=session,
            api_key=os.getenv("ELEVENLABS_API_KEY"),
            voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
        )

        llm = OpenAILLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
            model="gpt-3.5-turbo-0125",
        )

        messages = [
            {
                "role": "system",
                "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way.",
            },
        ]

        pipeline = Pipeline([llm, tts, transport.output()])

        task = PipelineTask(pipeline)

        async def say_something():
            await asyncio.sleep(1)
            await task.queue_frames([LLMMessagesFrame(messages), EndFrame()])

        runner = PipelineRunner()

        await asyncio.gather(runner.run(task), say_something())


if __name__ == "__main__":
    asyncio.run(main())

Example 2: Using Whisper locally:

import asyncio
import sys

from loguru import logger
from pipecat.frames.frames import Frame, TranscriptionFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.whisper import Model, WhisperSTTService
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")


class TranscriptionLogger(FrameProcessor):
    async def process_frame(self, frame: Frame, direction: FrameDirection):
        if isinstance(frame, TranscriptionFrame):
            print(f"Transcription: {frame.text}")


async def main():
    transport = LocalAudioTransport(TransportParams(audio_in_enabled=True))

    stt = WhisperSTTService()

    tl = TranscriptionLogger()

    pipeline = Pipeline([transport.input(), stt, tl])

    task = PipelineTask(pipeline)

    runner = PipelineRunner()

    await runner.run(task)


if __name__ == "__main__":
    asyncio.run(main())

Despite these individual successes, I'm unable to connect the transcriptions with the LLM and have a continuous conversation. Could you provide or add an example of a fully working local setup that demonstrates how to achieve this?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.