Add files using upload-large-folder tool
Browse files- audio_koala-filter_2508643f.txt +5 -0
- base-classes_text_9b014637.txt +5 -0
- c_introduction_776ac205.txt +5 -0
- client_rtvi-standard_2ec3359c.txt +5 -0
- client_rtvi-standard_ca56cef1.txt +5 -0
- deployment_pattern_006d5271.txt +5 -0
- deployment_pattern_d0e5580a.txt +5 -0
- deployment_pipecat-cloud_547c2458.txt +5 -0
- features_pipecat-flows_496dfcd1.txt +5 -0
- filters_wake-notifier-filter_848bf42d.txt +5 -0
- flows_pipecat-flows_8ba305e4.txt +5 -0
- fundamentals_function-calling_48720366.txt +5 -0
- fundamentals_recording-audio_ce92d889.txt +5 -0
- image-generation_google-imagen_e955ad4d.txt +5 -0
- llm_aws_d3b6b447.txt +5 -0
- llm_cerebras_35edb40b.txt +5 -0
- llm_fireworks_b0e46e86.txt +5 -0
- pipeline_pipeline-task_cc5b3be0.txt +5 -0
- pipeline_pipeline-task_f0365874.txt +5 -0
- react_hooks_a5603daf.txt +5 -0
- react_migration-guide_bc4b3f84.txt +5 -0
- s2s_openai_25a097b7.txt +5 -0
- serializers_plivo_7c6dac6c.txt +5 -0
- server_utilities_e939aab0.txt +5 -0
- stt_aws_685577e5.txt +5 -0
- stt_cartesia_8778939c.txt +5 -0
- stt_riva_e59eeddb.txt +5 -0
- telephony_daily-webrtc_03c1fa84.txt +5 -0
- transport_daily_3688deee.txt +5 -0
- transport_fastapi-websocket_03bba556.txt +5 -0
- transport_fastapi-websocket_dee48b44.txt +5 -0
- transport_small-webrtc_222927a9.txt +5 -0
- transport_websocket-server_63af9649.txt +5 -0
- transports_gemini-websocket_a7327b57.txt +5 -0
- tts_aws_0db50b6f.txt +5 -0
- tts_aws_32c7001f.txt +5 -0
- tts_elevenlabs_d2c244cd.txt +5 -0
- tts_elevenlabs_dfd65e53.txt +5 -0
- tts_google_7e5164eb.txt +5 -0
- tts_groq_ec887456.txt +5 -0
- tts_minimax_aef6f242.txt +5 -0
- tts_neuphonic_5f36257f.txt +5 -0
- tts_neuphonic_792be297.txt +5 -0
- tts_playht_24fee44a.txt +5 -0
- tts_sarvam_50614782.txt +5 -0
- tts_sarvam_920aa54f.txt +5 -0
- utilities_opentelemetry_1f6781e7.txt +5 -0
- utilities_opentelemetry_bae42f8b.txt +5 -0
- utilities_opentelemetry_ed334555.txt +5 -0
- utilities_transcript-processor_72e73b27.txt +5 -0
audio_koala-filter_2508643f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/koala-filter#installation
|
2 |
+
Title: KoalaFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
KoalaFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing KoalaFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview KoalaFilter is an audio processor that reduces background noise in real-time audio streams using Koala Noise Suppression technology from Picovoice. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Koala, you need a Picovoice access key. Get started at Picovoice Console . Installation The Koala filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[koala]" You’ll also need to set up your Koala access key as an environment variable: KOALA_ACCESS_KEY Constructor Parameters access_key str required Picovoice access key for using the Koala noise suppression service Input Frames FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) Usage Example Copy Ask AI from pipecat.audio.filters.koala_filter import KoalaFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KoalaFilter( access_key = os.getenv( "KOALA_ACCESS_KEY" )), # Enable Koala noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) Audio Flow Notes Requires Picovoice access key Supports real-time audio processing Handles 16-bit PCM audio format Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency Automatically handles audio frame buffering Sample rate must match Koala’s required sample rate AudioBufferProcessor KrispFilter On this page Overview Installation Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_text_9b014637.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/text#text-filters
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
c_introduction_776ac205.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/c++/introduction#dependencies
|
2 |
+
Title: SDK Introduction - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SDK Introduction - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation C++ SDK SDK Introduction Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat C++ SDK provides a native implementation for building voice and multimodal AI applications. It supports: Linux ( x86_64 and aarch64 ) macOS ( aarch64 ) Windows ( x86_64 ) Dependencies libcurl The SDK uses libcurl for HTTP requests. Linux macOS Windows Copy Ask AI sudo apt-get install libcurl4-openssl-dev Copy Ask AI sudo apt-get install libcurl4-openssl-dev On macOS libcurl is already included so there is nothing to install. On Windows we use vcpkg to install dependencies. You need to set it up following one of the tutorials . The libcurl dependency will be automatically downloaded when building. Installation Build the SDK using CMake: Linux/macOS Windows Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI # Initialize Visual Studio environment "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Auxiliary\Build\vcvarsall.bat" amd64 # Configure and build cmake . -Bbuild --preset vcpkg cmake --buildbuild --config Release Cross-compilation For Linux aarch64: Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_TOOLCHAIN_FILE=aarch64-linux-toolchain.cmake -DCMAKE_BUILD_TYPE=Release ninja -C build Documentation API Reference Complete SDK API documentation Daily Transport WebRTC implementation using Daily Small WebRTC Transport Daily WebRTC Transport On this page Dependencies libcurl Installation Cross-compilation Documentation Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_2ec3359c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#server-response-%F0%9F%A4%96
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_ca56cef1.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#bot-tts-started-%F0%9F%A4%96
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
deployment_pattern_006d5271.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/deployment/pattern
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
deployment_pattern_d0e5580a.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/deployment/pattern#what-you-can-build
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
deployment_pipecat-cloud_547c2458.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/deployment/pipecat-cloud#real-time-processing
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
features_pipecat-flows_496dfcd1.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/pipecat-flows#strategy-selection
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows Installation If you’re already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If you’re starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google 💡 Want to design your flows visually? Try the online Flow Editor Core Concepts Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, it’s expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node Node Structure Each node in your flow represents a conversation state and consists of three main components: Messages Nodes use two types of messages to control the conversation: Role Messages : Define the bot’s personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each node’s purpose. Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the function’s signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next node’s messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the node’s messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasn’t told them yet). While it’s always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new node’s messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current node’s purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Here’s a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You don’t need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. Implementation Approaches Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. Example Implementation Here’s a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) Best Practices Store shared data in flow_manager.state Create separate functions for node creation Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. Visual Design Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } Naming Conventions Start Node : Use descriptive names (e.g., “greeting”, “welcome”) Flow Nodes : Name based on purpose (e.g., “collect_info”, “verify_data”) End Node : Conventionally named “end” Functions : Use clear, action-oriented names Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. Using the Editor Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
|
filters_wake-notifier-filter_848bf42d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/wake-notifier-filter#constructor-parameters
|
2 |
+
Title: WakeNotifierFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
WakeNotifierFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters WakeNotifierFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview WakeNotifierFilter monitors the pipeline for specific frame types and triggers a notification when those frames pass a custom filter condition. It passes all frames through unchanged while performing this notification side-effect. Constructor Parameters notifier BaseNotifier required The notifier object to trigger when conditions are met types Tuple[Type[Frame]] required Tuple of frame types to monitor filter Callable[[Frame], Awaitable[bool]] required Async function that examines each matching frame and returns True to trigger notification Functionality The processor operates as follows: Checks if the incoming frame matches any of the specified types If it’s a matching type, calls the filter function with the frame If the filter returns True, triggers the notifier Passes all frames through unchanged, regardless of the filtering result This allows for notification side-effects without modifying the pipeline’s data flow. Output Frames All frames pass through unchanged in their original direction No frames are modified or filtered out Usage Example Copy Ask AI from pipecat.frames.frames import TranscriptionFrame, UserStartedSpeakingFrame from pipecat.processors.filters import WakeNotifierFilter from pipecat.sync.event_notifier import EventNotifier # Create an event notifier wake_event = EventNotifier() # Create filter that notifies when certain wake phrases are detected async def wake_phrase_filter ( frame ): if isinstance (frame, TranscriptionFrame): return "hey assistant" in frame.text.lower() return False # Add to pipeline wake_notifier = WakeNotifierFilter( notifier = wake_event, types = (TranscriptionFrame, UserStartedSpeakingFrame), filter = wake_phrase_filter ) # In another component, wait for the notification async def handle_wake_event (): await wake_event.wait() print ( "Wake phrase detected!" ) Frame Flow Notes Acts as a transparent pass-through for all frames Can trigger external events without modifying pipeline flow Useful for signaling between pipeline components Can monitor for multiple frame types simultaneously Uses async filter function for complex conditions Functions as a “listener” that doesn’t affect the data stream Can be used for logging, analytics, or coordinating external systems WakeCheckFilter OpenTelemetry On this page Overview Constructor Parameters Functionality Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
flows_pipecat-flows_8ba305e4.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-reset-with-summary
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows Core Types FlowArgs FlowArgs Dict[str, Any] Type alias for function handler arguments. FlowResult FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields status str Optional status field error str Optional error message FlowConfig FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields initial_node str required Starting node identifier nodes Dict[str, NodeConfig] required Map of node names to configurations NodeConfig NodeConfig TypedDict Configuration for a single node in the flow. Show Fields name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False Function Handler Types LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ContextStrategy ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values APPEND str Default strategy. Adds new messages to existing context. RESET str Clears context and starts fresh with new messages. RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ContextStrategyConfig ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields strategy ContextStrategy required The strategy to use for context management summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) FlowsFunctionSchema FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters name str required Name of the function description str required Description of the function’s purpose properties Dict[str, Any] required Dictionary defining properties types and descriptions required List[str] required List of required parameter names handler Optional[FunctionHandler] Function handler to process the function call transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) FlowManager FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters task PipelineTask required Pipeline task for frame queueing llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. context_aggregator Any required Context aggregator used for pushing messages to the LLM service tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. flow_config Optional[FlowConfig] Optional static flow configuration context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. Methods initialize method Initialize the flow with starting messages. Show Parameters initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises FlowInitializationError If initialization fails set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters node_id str required Identifier for the new node node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails register_action method Register a handler for a custom action type. Show Parameters action_type str required String identifier for the action handler Callable required Async or sync function that handles the action get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns messages List[dict] List of messages in the current context Show Raises FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } Function Properties handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: type str required String identifier for the action handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] Exceptions FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_function-calling_48720366.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/_sites/docs.pipecat.ai/guides/fundamentals/function-calling#what-you-can-build
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_recording-audio_ce92d889.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/recording-audio#post-processing-pipeline
|
2 |
+
Title: Recording Conversation Audio - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Recording Conversation Audio - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Recording Conversation Audio Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Overview Recording audio from conversations provides valuable data for analysis, debugging, and quality control. You have two options for how to record with Pipecat: Option 1: Record using your transport service provider Record without writing custom code by using your transport provider’s recording capabilities. In addition to saving you development time, some providers offer unique recording capabilities. Refer to your service provider’s documentation to learn more. Option 2: Create your own recording pipeline Pipecat’s AudioBufferProcessor makes it easy to capture high-quality audio recordings of both the user and bot during interactions. Opt for this approach if you want more control over your recording. This guide focuses on how to recording using the AudioBufferProcessor , including high-level guidance for how to set up post-processing jobs for longer recordings. How the AudioBufferProcessor Works The AudioBufferProcessor captures audio by: Collecting audio frames from both the user (input) and bot (output) Emitting events with recorded audio data Providing options for composite or separate track recordings Add the processor to your pipeline after the transport.output() to capture both the user audio and the bot audio as it’s spoken. Audio Recording Options The AudioBufferProcessor offers several configuration options: Composite recording : Combined audio from both user and bot Track-level recording : Separate audio files for user and bot Turn-based recording : Individual audio clips for each speaking turn Mono or stereo output : Single channel mixing or two-channel separation Basic Implementation Step 1: Create an Audio Buffer Processor Initialize the audio buffer processor with your desired configuration: Copy Ask AI from pipecat.processors.audio.audio_buffer_processor import AudioBufferProcessor # Create audio buffer processor with default settings audiobuffer = AudioBufferProcessor( num_channels = 1 , # 1 for mono, 2 for stereo (user left, bot right) enable_turn_audio = False , # Enable per-turn audio recording user_continuous_stream = True , # User has continuous audio stream ) Step 2: Add to Your Pipeline Place the processor in your pipeline after all audio-producing components: Copy Ask AI pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), audiobuffer, # Add after all audio components context_aggregator.assistant(), ] ) Step 3: Start Recording Explicitly start recording when needed, typically when a session begins: Copy Ask AI @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( f "Client connected" ) # Important: Start recording explicitly await audiobuffer.start_recording() # Continue with session initialization... You must call start_recording() explicitly to begin capturing audio. The processor won’t record automatically when initialized. Step 4: Handle Audio Data Register an event handler to process audio data: Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio , sample_rate , num_channels ): # Save or process the composite audio timestamp = datetime.datetime.now().strftime( "%Y%m %d _%H%M%S" ) filename = f "recordings/conversation_ { timestamp } .wav" # Create the WAV file with wave.open(filename, "wb" ) as wf: wf.setnchannels(num_channels) wf.setsampwidth( 2 ) # 16-bit audio wf.setframerate(sample_rate) wf.writeframes(audio) logger.info( f "Saved recording to { filename } " ) If recording separate tracks, you can use the on_track_audio_data event handler to save user and bot audio separately. Recording Longer Conversations For conversations that last a few minutes, it may be sufficient to just buffer the audio in memory. However, for longer sessions, storing audio in memory poses two challenges: Memory Usage : Long recordings can consume significant memory, leading to potential crashes or performance issues. Conversation Loss : If the application crashes or the connection drops, you may lose all recorded audio. Instead, consider using a chunked approach to record audio in manageable segments. This allows you to periodically save audio data to disk or upload it to cloud storage, reducing memory usage and ensuring data persistence. Chunked Recording Set a reasonable buffer_size to trigger periodic uploads: Copy Ask AI # 30-second chunks (recommended for most use cases) SAMPLE_RATE = 24000 CHUNK_DURATION = 30 # seconds audiobuffer = AudioBufferProcessor( sample_rate = SAMPLE_RATE , buffer_size = SAMPLE_RATE * 2 * CHUNK_DURATION # 2 bytes per sample (16-bit) ) chunk_counter = 0 @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_chunk_ready ( buffer , user_audio , bot_audio , sample_rate , num_channels ): global chunk_counter # Upload or save individual chunks await upload_audio_chunk( f "user_chunk_ { chunk_counter :03d} .wav" , user_audio, sample_rate, 1 ) await upload_audio_chunk( f "bot_chunk_ { chunk_counter :03d} .wav" , bot_audio, sample_rate, 1 ) chunk_counter += 1 Multipart Upload Strategy For cloud storage, consider using multipart uploads to stream audio chunks: Conceptual Approach: Initialize multipart upload when recording starts Upload chunks as parts when buffers fill (every ~30 seconds) Complete multipart upload when recording ends Post-process to create final WAV file(s) Benefits: Memory efficient for long sessions Fault tolerant (no data loss if connection drops) Enables real-time processing and analysis Parallel upload of multiple tracks Post-Processing Pipeline After uploading chunks, create final audio files using tools like FFmpeg: Concatenating Audio Files: Copy Ask AI # Method 1: Simple concatenation (same format) ffmpeg -i "concat:chunk_001.wav|chunk_002.wav|chunk_003.wav" -acodec copy final.wav # Method 2: Using file list (recommended for many chunks) # Create filelist.txt with format: # file 'chunk_001.wav' # file 'chunk_002.wav' # ... ffmpeg -f concat -safe 0 -i filelist.txt -c copy final_recording.wav Automation Considerations: Use sequence numbers in chunk filenames for proper ordering Include metadata (sample rate, channels, duration) with each chunk Implement retry logic for failed uploads Consider using cloud functions/lambdas for automatic post-processing Next Steps Try the Audio Recording Example Explore a complete working example that demonstrates how to record and save both composite and track-level audio with Pipecat. AudioBufferProcessor Reference Read the complete API reference documentation for advanced configuration options and event handlers. Consider implementing audio recording in your application for quality assurance, training data collection, or creating conversation archives. The recorded audio can be stored locally, uploaded to cloud storage, or processed in real-time for further analysis. Muting User Input Recording Transcripts On this page Overview Option 1: Record using your transport service provider Option 2: Create your own recording pipeline How the AudioBufferProcessor Works Audio Recording Options Basic Implementation Step 1: Create an Audio Buffer Processor Step 2: Add to Your Pipeline Step 3: Start Recording Step 4: Handle Audio Data Recording Longer Conversations Chunked Recording Multipart Upload Strategy Post-Processing Pipeline Next Steps Assistant Responses are generated using AI and may contain mistakes.
|
image-generation_google-imagen_e955ad4d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/image-generation/google-imagen#urlimagerawframe
|
2 |
+
Title: Google Imagen - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Google Imagen - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Image Generation Google Imagen Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation fal Google Imagen OpenAI Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GoogleImageGenService provides high-quality image generation capabilities using Google’s Imagen models. It supports generating multiple images from text prompts with various customization options. Installation To use GoogleImageGenService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll also need to set up your Google API key as an environment variable: GOOGLE_API_KEY Configuration Constructor Parameters params InputParams default: "InputParams()" Generation parameters configuration api_key str required Google API key for authentication Input Parameters number_of_images int default: "1" Number of images to generate (1-8) model str default: "imagen-3.0-generate-002" Model identifier negative_prompt str default: "None" Elements to exclude from generation Input The service accepts text prompts through its image generation pipeline. Output Frames URLImageRawFrame url string Generated image URL (null for Google implementation as it returns raw bytes) image bytes Raw image data size tuple Image dimensions (width, height) format string Image format (e.g., ‘JPEG’) ErrorFrame error string Error information if generation fails Usage Example Copy Ask AI from pipecat.services.google.image import GoogleImageGenService # Configure service image_gen = GoogleImageGenService( api_key = "your-google-api-key" , params = GoogleImageGenService.InputParams( number_of_images = 2 , model = "imagen-3.0-generate-002" , negative_prompt = "blurry, distorted, low quality" ) ) # Use in pipeline main_pipeline = Pipeline( [ transport.input(), context_aggregator.user(), llm_service, image_gen, tts_service, transport.output(), context_aggregator.assistant(), ] ) Frame Flow Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Processing duration API response metrics Model Support Google’s Imagen service offers different model variants: Model ID Description imagen-3.0-generate-002 Latest Imagen model with high-quality outputs See other available models in Google’s Imagen documentation . Error Handling Copy Ask AI try : async for frame in service.run_image_gen(prompt): if isinstance (frame, ErrorFrame): handle_error(frame.error) except Exception as e: logger.error( f "Image generation error: { e } " ) fal OpenAI On this page Overview Installation Configuration Constructor Parameters Input Parameters Input Output Frames URLImageRawFrame ErrorFrame Usage Example Frame Flow Metrics Support Model Support Error Handling Assistant Responses are generated using AI and may contain mistakes.
|
llm_aws_d3b6b447.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/aws#installation
|
2 |
+
Title: AWS Bedrock - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AWS Bedrock - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM AWS Bedrock Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview AWS Bedrock LLM service provides access to Amazon’s foundation models including Anthropic Claude and Amazon Nova, with streaming responses, function calling, and multimodal capabilities through Amazon’s managed AI service. API Reference Complete API documentation and method details AWS Bedrock Docs Official AWS Bedrock documentation and features Example Code Working example with function calling Installation To use AWS Bedrock services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) Set up an IAM user with Amazon Bedrock access in your AWS account to obtain credentials. Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.aws.llm import AWSBedrockLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = AWSBedrockLLMService( aws_region = "us-west-2" , model = "us.anthropic.claude-3-5-haiku-20241022-v1:0" , params = AWSBedrockLLMService.InputParams( temperature = 0.7 , ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Register function handler async def get_current_weather ( params ): location = params.arguments[ "location" ] format_type = params.arguments[ "format" ] result = { "conditions" : "sunny" , "temperature" : "75" , "unit" : format_type} await params.result_callback(result) llm.register_function( "get_current_weather" , get_current_weather) # Create context with system message messages = [ { "role" : "system" , "content" : "You are a helpful assistant with access to weather information." } ] context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # Handles user messages llm, # Processes with AWS Bedrock tts, transport.output(), context_aggregator.assistant() # Captures responses ]) Metrics The service provides comprehensive AWS Bedrock metrics: Time to First Byte (TTFB) - Latency from request to first response token Processing Duration - Total request processing time Token Usage - Input tokens, output tokens, and total usage Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes Streaming Responses : All responses are streamed for low latency Context Persistence : Use context aggregators to maintain conversation history Error Handling : Automatic retry logic for rate limits and transient errors Message Format : Automatically converts between OpenAI and AWS Bedrock message formats Performance Modes : Choose “standard” or “optimized” latency based on your needs Regional Availability : Different models available in different AWS regions Vision Support : Image processing available with compatible models like Claude 3 Anthropic Azure On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_cerebras_35edb40b.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/cerebras#overview
|
2 |
+
Title: Cerebras - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Cerebras - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Cerebras Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview CerebrasLLMService provides access to Cerebras’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Cerebras Docs Official Cerebras inference API documentation Example Code Working example with function calling Installation To use Cerebras services, install the required dependency: Copy Ask AI pip install "pipecat-ai[cerebras]" You’ll also need to set up your Cerebras API key as an environment variable: CEREBRAS_API_KEY . Get your API key from Cerebras Cloud . Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.cerebras.llm import CerebrasLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = CerebrasLLMService( api_key = os.getenv( "CEREBRAS_API_KEY" ), model = "llama-3.3-70b" , params = CerebrasLLMService.InputParams( temperature = 0.7 , max_completion_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : "You are a helpful assistant for weather information. Keep responses concise for voice output." } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) Metrics Inherits all OpenAI-compatible metrics: Time to First Byte (TTFB) - Ultra-low latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API parameters and responses Streaming Responses : All responses are streamed for minimal latency Function Calling : Full support for OpenAI-style tool calling Open Source Models : Access to latest Llama models with commercial licensing Azure DeepSeek On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_fireworks_b0e46e86.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/fireworks#overview
|
2 |
+
Title: Fireworks AI - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Fireworks AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Fireworks AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview FireworksLLMService provides access to Fireworks AI’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Fireworks Docs Official Fireworks AI API documentation and features Example Code Working example with function calling Installation To use Fireworks AI services, install the required dependency: Copy Ask AI pip install "pipecat-ai[fireworks]" You’ll also need to set up your Fireworks API key as an environment variable: FIREWORKS_API_KEY . Get your API key from Fireworks AI Console . Frames Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. Usage Example Copy Ask AI import os from pipecat.services.fireworks.llm import FireworksLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Fireworks service llm = FireworksLLMService( api_key = os.getenv( "FIREWORKS_API_KEY" ), model = "accounts/fireworks/models/firefunction-v2" , # Optimized for function calling params = FireworksLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice interactions. Keep responses concise and avoid special characters for audio output.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Function Calling : Specialized firefunction models optimized for tool use Cost Effective : Competitive pricing for open-source model inference DeepSeek Google Gemini On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
pipeline_pipeline-task_cc5b3be0.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/pipeline/pipeline-task#task-lifecycle-management
|
2 |
+
Title: PipelineTask - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PipelineTask - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Pipeline PipelineTask Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview PipelineTask is the central class for managing pipeline execution. It handles the lifecycle of the pipeline, processes frames in both directions, manages task cancellation, and provides event handlers for monitoring pipeline activity. Basic Usage Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask # Create a pipeline pipeline = Pipeline([ ... ]) # Create a task with the pipeline task = PipelineTask(pipeline) # Queue frames for processing await task.queue_frame(TTSSpeakFrame( "Hello, how can I help you today?" )) # Run the pipeline runner = PipelineRunner() await runner.run(task) Constructor Parameters pipeline BasePipeline required The pipeline to execute. params PipelineParams default: "PipelineParams()" Configuration parameters for the pipeline. See PipelineParams for details. observers List[BaseObserver] default: "[]" List of observers for monitoring pipeline execution. See Observers for details. clock BaseClock default: "SystemClock()" Clock implementation for timing operations. task_manager Optional[BaseTaskManager] default: "None" Custom task manager for handling asyncio tasks. If None, a default TaskManager is used. check_dangling_tasks bool default: "True" Whether to check for processors’ tasks finishing properly. idle_timeout_secs Optional[float] default: "300" Timeout in seconds before considering the pipeline idle. Set to None to disable idle detection. See Pipeline Idle Detection for details. idle_timeout_frames Tuple[Type[Frame], ...] default: "(BotSpeakingFrame, LLMFullResponseEndFrame)" Frame types that should prevent the pipeline from being considered idle. See Pipeline Idle Detection for details. cancel_on_idle_timeout bool default: "True" Whether to automatically cancel the pipeline task when idle timeout is reached. See Pipeline Idle Detection for details. enable_tracing bool default: "False" Whether to enable OpenTelemetry tracing. See The OpenTelemetry guide for details. enable_turn_tracking bool default: "False" Whether to enable turn tracking. See The OpenTelemetry guide for details. conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated. See The OpenTelemetry guide for details. additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. See The OpenTelemetry guide for details. Methods Task Lifecycle Management run() async Starts and manages the pipeline execution until completion or cancellation. Copy Ask AI await task.run() stop_when_done() async Sends an EndFrame to the pipeline to gracefully stop the task after all queued frames have been processed. Copy Ask AI await task.stop_when_done() cancel() async Stops the running pipeline immediately by sending a CancelFrame. Copy Ask AI await task.cancel() has_finished() bool Returns whether the task has finished (all processors have stopped). Copy Ask AI if task.has_finished(): print ( "Task is complete" ) Frame Management queue_frame() async Queues a single frame to be pushed down the pipeline. Copy Ask AI await task.queue_frame(TTSSpeakFrame( "Hello!" )) queue_frames() async Queues multiple frames to be pushed down the pipeline. Copy Ask AI frames = [TTSSpeakFrame( "Hello!" ), TTSSpeakFrame( "How are you?" )] await task.queue_frames(frames) Event Handlers PipelineTask provides an event handler that can be registered using the event_handler decorator: on_idle_timeout Triggered when no activity frames (as specified by idle_timeout_frames ) have been received within the idle timeout period. Copy Ask AI @task.event_handler ( "on_idle_timeout" ) async def on_idle_timeout ( task ): print ( "Pipeline has been idle too long" ) await task.queue_frame(TTSSpeakFrame( "Are you still there?" )) PipelineParams Pipeline Idle Detection On this page Overview Basic Usage Constructor Parameters Methods Task Lifecycle Management Frame Management Event Handlers on_idle_timeout Assistant Responses are generated using AI and may contain mistakes.
|
pipeline_pipeline-task_f0365874.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/pipeline/pipeline-task#basic-usage
|
2 |
+
Title: PipelineTask - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PipelineTask - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Pipeline PipelineTask Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview PipelineTask is the central class for managing pipeline execution. It handles the lifecycle of the pipeline, processes frames in both directions, manages task cancellation, and provides event handlers for monitoring pipeline activity. Basic Usage Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask # Create a pipeline pipeline = Pipeline([ ... ]) # Create a task with the pipeline task = PipelineTask(pipeline) # Queue frames for processing await task.queue_frame(TTSSpeakFrame( "Hello, how can I help you today?" )) # Run the pipeline runner = PipelineRunner() await runner.run(task) Constructor Parameters pipeline BasePipeline required The pipeline to execute. params PipelineParams default: "PipelineParams()" Configuration parameters for the pipeline. See PipelineParams for details. observers List[BaseObserver] default: "[]" List of observers for monitoring pipeline execution. See Observers for details. clock BaseClock default: "SystemClock()" Clock implementation for timing operations. task_manager Optional[BaseTaskManager] default: "None" Custom task manager for handling asyncio tasks. If None, a default TaskManager is used. check_dangling_tasks bool default: "True" Whether to check for processors’ tasks finishing properly. idle_timeout_secs Optional[float] default: "300" Timeout in seconds before considering the pipeline idle. Set to None to disable idle detection. See Pipeline Idle Detection for details. idle_timeout_frames Tuple[Type[Frame], ...] default: "(BotSpeakingFrame, LLMFullResponseEndFrame)" Frame types that should prevent the pipeline from being considered idle. See Pipeline Idle Detection for details. cancel_on_idle_timeout bool default: "True" Whether to automatically cancel the pipeline task when idle timeout is reached. See Pipeline Idle Detection for details. enable_tracing bool default: "False" Whether to enable OpenTelemetry tracing. See The OpenTelemetry guide for details. enable_turn_tracking bool default: "False" Whether to enable turn tracking. See The OpenTelemetry guide for details. conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated. See The OpenTelemetry guide for details. additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. See The OpenTelemetry guide for details. Methods Task Lifecycle Management run() async Starts and manages the pipeline execution until completion or cancellation. Copy Ask AI await task.run() stop_when_done() async Sends an EndFrame to the pipeline to gracefully stop the task after all queued frames have been processed. Copy Ask AI await task.stop_when_done() cancel() async Stops the running pipeline immediately by sending a CancelFrame. Copy Ask AI await task.cancel() has_finished() bool Returns whether the task has finished (all processors have stopped). Copy Ask AI if task.has_finished(): print ( "Task is complete" ) Frame Management queue_frame() async Queues a single frame to be pushed down the pipeline. Copy Ask AI await task.queue_frame(TTSSpeakFrame( "Hello!" )) queue_frames() async Queues multiple frames to be pushed down the pipeline. Copy Ask AI frames = [TTSSpeakFrame( "Hello!" ), TTSSpeakFrame( "How are you?" )] await task.queue_frames(frames) Event Handlers PipelineTask provides an event handler that can be registered using the event_handler decorator: on_idle_timeout Triggered when no activity frames (as specified by idle_timeout_frames ) have been received within the idle timeout period. Copy Ask AI @task.event_handler ( "on_idle_timeout" ) async def on_idle_timeout ( task ): print ( "Pipeline has been idle too long" ) await task.queue_frame(TTSSpeakFrame( "Are you still there?" )) PipelineParams Pipeline Idle Detection On this page Overview Basic Usage Constructor Parameters Methods Task Lifecycle Management Frame Management Event Handlers on_idle_timeout Assistant Responses are generated using AI and may contain mistakes.
|
react_hooks_a5603daf.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/react/hooks#usepipecatclientmediadevices
|
2 |
+
Title: Hooks - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Hooks - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation API Reference Hooks Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference Components Hooks React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides hooks for accessing client functionality, managing media devices, and handling events. usePipecatClient Provides access to the PipecatClient instance originally passed to PipecatClientProvider . Copy Ask AI import { usePipecatClient } from "@pipecat-ai/client-react" ; function MyComponent () { const pcClient = usePipecatClient (); await pcClient . connect ({ endpoint: 'https://your-pipecat-api-url/connect' , requestData: { // Any custom data your /connect endpoint requires } }); } useRTVIClientEvent Allows subscribing to RTVI client events. It is advised to wrap handlers with useCallback . Copy Ask AI import { useCallback } from "react" ; import { RTVIEvent , TransportState } from "@pipecat-ai/client-js" ; import { useRTVIClientEvent } from "@pipecat-ai/client-react" ; function EventListener () { useRTVIClientEvent ( RTVIEvent . TransportStateChanged , useCallback (( transportState : TransportState ) => { console . log ( "Transport state changed to" , transportState ); }, []) ); } Arguments event RTVIEvent required handler function required usePipecatClientMediaDevices Manage and list available media devices. Copy Ask AI import { usePipecatClientMediaDevices } from "@pipecat-ai/client-react" ; function DeviceSelector () { const { availableCams , availableMics , selectedCam , selectedMic , updateCam , updateMic , } = usePipecatClientMediaDevices (); return ( <> < select name = "cam" onChange = { ( ev ) => updateCam ( ev . target . value ) } value = { selectedCam ?. deviceId } > { availableCams . map (( cam ) => ( < option key = { cam . deviceId } value = { cam . deviceId } > { cam . label } </ option > )) } </ select > < select name = "mic" onChange = { ( ev ) => updateMic ( ev . target . value ) } value = { selectedMic ?. deviceId } > { availableMics . map (( mic ) => ( < option key = { mic . deviceId } value = { mic . deviceId } > { mic . label } </ option > )) } </ select > </> ); } usePipecatClientMediaTrack Access audio and video tracks. Copy Ask AI import { usePipecatClientMediaTrack } from "@pipecat-ai/client-react" ; function MyTracks () { const localAudioTrack = usePipecatClientMediaTrack ( "audio" , "local" ); const botAudioTrack = usePipecatClientMediaTrack ( "audio" , "bot" ); } Arguments trackType 'audio' | 'video' required participantType 'bot' | 'local' required usePipecatClientTransportState Returns the current transport state. Copy Ask AI import { usePipecatClientTransportState } from "@pipecat-ai/client-react" ; function ConnectionStatus () { const transportState = usePipecatClientTransportState (); } usePipecatClientCamControl Controls the local participant’s camera state. Copy Ask AI import { usePipecatClientCamControl } from "@pipecat-ai/client-react" ; function CamToggle () { const { enableCam , isCamEnabled } = usePipecatClientCamControl (); return ( < button onClick = { () => enableCam ( ! isCamEnabled ) } > { isCamEnabled ? "Disable Camera" : "Enable Camera" } </ button > ); } usePipecatClientMicControl Controls the local participant’s microphone state. Copy Ask AI import { usePipecatClientMicControl } from "@pipecat-ai/client-react" ; function MicToggle () { const { enableMic , isMicEnabled } = usePipecatClientMicControl (); return ( < button onClick = { () => enableMic ( ! isMicEnabled ) } > { isMicEnabled ? "Disable Microphone" : "Enable Microphone" } </ button > ); } Components SDK Introduction On this page usePipecatClient useRTVIClientEvent usePipecatClientMediaDevices usePipecatClientMediaTrack usePipecatClientTransportState usePipecatClientCamControl usePipecatClientMicControl Assistant Responses are generated using AI and may contain mistakes.
|
react_migration-guide_bc4b3f84.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/react/migration-guide#migration-steps
|
2 |
+
Title: RTVIClient Migration Guide for React - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
RTVIClient Migration Guide for React - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation RTVIClient Migration Guide for React Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport This guide covers migrating from RTVIClient to the new PipecatClient in a React application. The new client introduces simplified configuration and improved client-server messaging. For an overview of the changes, see the top-level RTVIClient Migration Guide . Key Changes Package and Class Names Copy Ask AI // Old import { RTVIClient } from '@pipecat-ai/client-js' ; // New import { PipecatClient } from '@pipecat-ai/client-js' ; React Components and Hooks Copy Ask AI // Old import { RTVIClientProvider , RTVIClientAudio , RTVIClientVideo , useRTVIClient , useRTVIClientTransportState } from '@pipecat-ai/client-react' ; // New import { PipecatClientProvider , PipecatClientAudio , PipecatClientVideo , usePipecatClient , usePipecatClientTransportState } from '@pipecat-ai/client-react' ; Client and Transport Configuration Copy Ask AI // Old const transport = new DailyTransport (); const client = new RTVIClient ({ transport , params: { baseUrl: 'http://localhost:7860' , endpoints: { connect: '/connect' } } }); // New const client = new PipecatClient ({ transport: new DailyTransport (), // Connection params moved to connect() call }); Connection Method Copy Ask AI // Old await client . connect (); // New await client . connect ({ endpoint: 'http://localhost:7860/connect' , requestData: { // Any custom data your /connect endpoint requires llm_provider: 'openai' , initial_prompt: "You are a pirate captain" , // Any additional data } }); Function Call Handling Copy Ask AI // Old let llmHelper = new LLMHelper ({}); llmHelper . handleFunctionCall ( async ( data ) => { return await this . handleFunctionCall ( data . functionName , data . arguments ); }); client . registerHelper ( 'openai' , llmHelper ); // New client . registerFunctionCallHandler ( 'functionName' , async ( data ) => { // Handle function call return result ; }); Breaking Changes Configuration Structure : Connection parameters are now passed to connect() instead of constructor Helper System : Removed in favor of direct PipecatClient member functions or client-server messaging. Component Names : All React components renamed from RTVI prefix to Pipecat prefix Hook Names : All React hooks renamed from useRTVI prefix to usePipecat prefix Migration Steps Update package imports to use new names Move connection configuration from constructor to connect() method Replace any helper classes with corresponding PipecatClient methods or custom messaging Update React component and hook names Update any TypeScript types referencing old names On this page Key Changes Breaking Changes Migration Steps Assistant Responses are generated using AI and may contain mistakes.
|
s2s_openai_25a097b7.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/s2s/openai
|
2 |
+
Title: OpenAI Realtime Beta - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . Configuration Constructor Parameters api_key str required Your OpenAI API key model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL session_properties SessionProperties Configuration for the realtime session start_audio_paused bool default: "False" Whether to start with audio input paused send_transcription_frames bool default: "True" Whether to emit transcription frames Session Properties The SessionProperties object configures the behavior of the realtime session: modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) tools List[Dict] List of function definitions for tool/function calling tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions temperature float Controls randomness in responses (0.0 to 2.0) max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate Input Frames Audio Input InputAudioRawFrame Frame Raw audio data for speech input Control Input StartInterruptionFrame Frame Signals start of user interruption UserStartedSpeakingFrame Frame Signals user started speaking UserStoppedSpeakingFrame Frame Signals user stopped speaking Context Input OpenAILLMContextFrame Frame Contains conversation context LLMMessagesAppendFrame Frame Appends messages to conversation Output Frames Audio Output TTSAudioRawFrame Frame Generated speech audio Control Output TTSStartedFrame Frame Signals start of speech synthesis TTSStoppedFrame Frame Signals end of speech synthesis Text Output TextFrame Frame Generated text responses TranscriptionFrame Frame Speech transcriptions Events on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) Methods retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples Frame Flow Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics Advanced Features Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
|
serializers_plivo_7c6dac6c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/serializers/plivo#constructor-parameters
|
2 |
+
Title: PlivoFrameSerializer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PlivoFrameSerializer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Serializers PlivoFrameSerializer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Frame Serializer Overview ExotelFrameSerializer PlivoFrameSerializer TwilioFrameSerializer TelnyxFrameSerializer Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview PlivoFrameSerializer enables integration with Plivo’s Audio Streaming WebSocket protocol, allowing your Pipecat application to handle phone calls via Plivo’s voice services. Features Bidirectional audio conversion between Pipecat and Plivo DTMF (touch-tone) event handling Automatic call termination via Plivo’s REST API μ-law audio encoding/decoding Installation The PlivoFrameSerializer does not require any additional dependencies beyond the core Pipecat library. Configuration Constructor Parameters stream_id str required The Plivo Stream ID call_id Optional[str] default: "None" The associated Plivo Call ID (required for auto hang-up) auth_id Optional[str] default: "None" Plivo auth ID (required for auto hang-up) auth_token Optional[str] default: "None" Plivo auth token (required for auto hang-up) params InputParams default: "InputParams()" Configuration parameters InputParams Configuration plivo_sample_rate int default: "8000" Sample rate used by Plivo (typically 8kHz) sample_rate int | None default: "None" Optional override for pipeline input sample rate auto_hang_up bool default: "True" Whether to automatically terminate call on EndFrame Basic Usage Copy Ask AI from pipecat.serializers.plivo import PlivoFrameSerializer from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) # Extract required values from Plivo WebSocket connection stream_id = start_message[ "start" ][ "streamId" ] call_id = start_message[ "start" ][ "callId" ] # Create serializer serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, auth_id = "your_plivo_auth_id" , auth_token = "your_plivo_auth_token" ) # Use with FastAPIWebsocketTransport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = serializer, ) ) Hang-up Functionality When auto_hang_up is enabled, the serializer will automatically hang up the Plivo call when an EndFrame or CancelFrame is processed, using Plivo’s REST API: Copy Ask AI # Properly configured with hang-up support serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, # Required for auto hang-up auth_id = os.getenv( "PLIVO_AUTH_ID" ), # Required for auto hang-up auth_token = os.getenv( "PLIVO_AUTH_TOKEN" ), # Required for auto hang-up ) Server Code Example Here’s a complete example of handling a Plivo WebSocket connection: Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.serializers.plivo import PlivoFrameSerializer import json import os app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Read the start message from Plivo start_data = websocket.iter_text() start_message = json.loads( await start_data. __anext__ ()) # Extract Plivo-specific IDs from the start event start_info = start_message.get( "start" , {}) stream_id = start_info.get( "streamId" ) call_id = start_info.get( "callId" ) # Create serializer with authentication for auto hang-up serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, auth_id = os.getenv( "PLIVO_AUTH_ID" ), auth_token = os.getenv( "PLIVO_AUTH_TOKEN" ), ) # Continue with transport and pipeline setup... Plivo XML Configuration To enable audio streaming with Plivo, you’ll need to configure your Plivo application to return appropriate XML: Copy Ask AI <? xml version = "1.0" encoding = "UTF-8" ?> < Response > < Stream keepCallAlive = "true" bidirectional = "true" contentType = "audio/x-mulaw;rate=8000" > wss://your-websocket-url/ws </ Stream > </ Response > The bidirectional="true" attribute is required for two-way audio communication, and keepCallAlive="true" prevents the call from being disconnected after XML execution. Key Differences from Twilio Stream Identifier : Plivo uses streamId instead of streamSid Call Identifier : Plivo uses callId instead of callSid XML Structure : Plivo uses <Stream> element directly instead of <Connect><Stream> Authentication : Plivo uses Auth ID and Auth Token instead of Account SID and Auth Token See the Plivo Chatbot example for a complete implementation. ExotelFrameSerializer TwilioFrameSerializer On this page Overview Features Installation Configuration Constructor Parameters InputParams Configuration Basic Usage Hang-up Functionality Server Code Example Plivo XML Configuration Key Differences from Twilio Assistant Responses are generated using AI and may contain mistakes.
|
server_utilities_e939aab0.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities#param-transformer
|
2 |
+
Title: Producer & Consumer Processors - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Producer & Consumer Processors - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Advanced Frame Processors Producer & Consumer Processors Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview The Producer and Consumer processors work as a pair to route frames between different parts of a pipeline, particularly useful when working with ParallelPipeline . They allow you to selectively capture frames from one pipeline branch and inject them into another. ProducerProcessor ProducerProcessor examines frames flowing through the pipeline, applies a filter to decide which frames to share, and optionally transforms these frames before sending them to connected consumers. Constructor Parameters filter Callable[[Frame], Awaitable[bool]] required An async function that determines which frames should be sent to consumers. Should return True for frames to be shared. transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before sending to consumers. By default, passes frames unchanged. passthrough bool default: "True" When True , passes all frames through the normal pipeline flow. When False , only passes through frames that don’t match the filter. ConsumerProcessor ConsumerProcessor receives frames from a ProducerProcessor and injects them into its pipeline branch. Constructor Parameters producer ProducerProcessor required The producer processor that will send frames to this consumer. transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before injecting them into the pipeline. direction FrameDirection default: "FrameDirection.DOWNSTREAM" The direction in which to push received frames. Usually DOWNSTREAM to send frames forward in the pipeline. Usage Examples Basic Usage: Moving TTS Audio Between Branches Copy Ask AI # Create a producer that captures TTS audio frames async def is_tts_audio ( frame : Frame) -> bool : return isinstance (frame, TTSAudioRawFrame) # Define an async transformer function async def tts_to_input_audio_transformer ( frame : Frame) -> Frame: if isinstance (frame, TTSAudioRawFrame): # Convert TTS audio to input audio format return InputAudioRawFrame( audio = frame.audio, sample_rate = frame.sample_rate, num_channels = frame.num_channels ) return frame producer = ProducerProcessor( filter = is_tts_audio, transformer = tts_to_input_audio_transformer passthrough = True # Keep these frames in original pipeline ) # Create a consumer to receive the frames consumer = ConsumerProcessor( producer = producer, direction = FrameDirection. DOWNSTREAM ) # Use in a ParallelPipeline pipeline = Pipeline([ transport.input(), ParallelPipeline( # Branch 1: LLM for bot responses [ llm, tts, producer, # Capture TTS audio here ], # Branch 2: Audio processing branch [ consumer, # Receive TTS audio here llm, # Speech-to-Speech LLM (audio in) ] ), transport.output(), ]) Sentry Metrics UserIdleProcessor On this page Overview ProducerProcessor Constructor Parameters ConsumerProcessor Constructor Parameters Usage Examples Basic Usage: Moving TTS Audio Between Branches Assistant Responses are generated using AI and may contain mistakes.
|
stt_aws_685577e5.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/aws#input
|
2 |
+
Title: AWS Transcribe - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AWS Transcribe - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text AWS Transcribe Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview AWSTranscribeSTTService provides real-time speech-to-text capabilities using Amazon Transcribe’s WebSocket API. It supports interim results, adjustable quality levels, and can handle continuous audio streams. Installation To use AWSTranscribeSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) You can obtain AWS credentials by setting up an IAM user with access to Amazon Transcribe in your AWS account. Configuration Constructor Parameters api_key str Your AWS secret access key (can also use environment variable) aws_access_key_id str Your AWS access key ID (can also use environment variable) aws_session_token str Your AWS session token for temporary credentials (can also use environment variable) region str default: "us-east-1" AWS region to use for Transcribe service sample_rate int default: "16000" Audio sample rate in Hz (only 8000 Hz or 16000 Hz are supported) language Language default: "Language.EN" Language for transcription Default Settings Copy Ask AI { "sample_rate" : 16000 , "language" : Language. EN , "media_encoding" : "linear16" , # AWS expects raw PCM "number_of_channels" : 1 , "show_speaker_label" : False , "enable_channel_identification" : False } Input The service processes InputAudioRawFrame instances containing: Raw PCM audio data 16-bit depth 8kHz or 16kHz sample rate (will convert to 16kHz if another rate is provided) Single channel (mono) Output Frames The service produces two types of frames during transcription: TranscriptionFrame Generated for final transcriptions, containing: text string Transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Language used for transcription InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. Methods See the STT base class methods for additional functionality. Language Setting Copy Ask AI await service.set_language(Language. FR ) Usage Example Copy Ask AI from pipecat.services.aws.stt import AWSTranscribeSTTService # Configure service using environment variables for credentials stt = AWSTranscribeSTTService( region = "us-west-2" , sample_rate = 16000 , language = Language. EN ) # Or provide credentials directly stt = AWSTranscribeSTTService( aws_access_key_id = "YOUR_ACCESS_KEY_ID" , api_key = "YOUR_SECRET_ACCESS_KEY" , region = "us-west-2" , sample_rate = 16000 , language = Language. EN ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) Language Support AWS Transcribe STT supports the following languages: Language Code Description Service Codes Language.EN English (US) en-US Language.ES Spanish es-US Language.FR French fr-FR Language.DE German de-DE Language.IT Italian it-IT Language.PT Portuguese (Brazil) pt-BR Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.ZH Chinese (Mandarin) zh-CN AWS Transcribe supports additional languages and regional variants. See the AWS Transcribe documentation for a complete list. Frame Flow Metrics Support The service supports the following metrics: Time to First Byte (TTFB) Processing duration Notes Requires valid AWS credentials with access to Amazon Transcribe Supports real-time transcription with interim results Handles WebSocket connection management and reconnection Only supports mono audio (single channel) Automatically handles audio format conversion to PCM Manages connection lifecycle (start, stop, cancel) AssemblyAI Azure On this page Overview Installation Configuration Constructor Parameters Default Settings Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Usage Example Language Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
stt_cartesia_8778939c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/cartesia#notes
|
2 |
+
Title: Cartesia - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Cartesia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Cartesia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview CartesiaSTTService provides real-time speech-to-text capabilities using Cartesia’s WebSocket API. It supports streaming transcription with both interim and final results using the ink-whisper model. Installation To use CartesiaSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[cartesia]" You’ll also need to set up your Cartesia API key as an environment variable: CARTESIA_API_KEY . You can obtain a Cartesia API key by signing up at Cartesia . Configuration Constructor Parameters api_key str required Your Cartesia API key base_url str default: "api.cartesia.ai" Custom Cartesia API endpoint URL sample_rate int default: "16000" Audio sample rate in Hz live_options CartesiaLiveOptions Custom transcription options CartesiaLiveOptions model str default: "ink-whisper" The Cartesia transcription model to use language str default: "en" Language code for transcription encoding str default: "pcm_s16le" Audio encoding format sample_rate int default: "16000" Audio sample rate in Hz Default Options Copy Ask AI CartesiaLiveOptions( model = "ink-whisper" , language = "en" , encoding = "pcm_s16le" , sample_rate = 16000 ) Input The service processes raw audio data with the following requirements: PCM audio format ( pcm_s16le ) 16-bit depth 16kHz sample rate (default) Single channel (mono) Output Frames The service produces two types of frames during transcription: TranscriptionFrame Generated for final transcriptions, containing: text string Final transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Detected or configured language InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. Methods See the STT base class methods for additional functionality. Language Setting The service supports language configuration through the CartesiaLiveOptions : Copy Ask AI live_options = CartesiaLiveOptions( language = "es" ) Model Selection Copy Ask AI live_options = CartesiaLiveOptions( model = "ink-whisper" ) Usage Example Copy Ask AI from pipecat.services.cartesia.stt import CartesiaSTTService, CartesiaLiveOptions from pipecat.transcriptions.language import Language # Basic configuration stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ) ) # Advanced configuration live_options = CartesiaLiveOptions( model = "ink-whisper" , language = Language. ES .value, sample_rate = 16000 , encoding = "pcm_s16le" ) stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ), live_options = live_options ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) Frame Flow Connection Management The service automatically manages WebSocket connections: Auto-reconnect : Reconnects automatically when the connection is closed due to timeout Finalization : Sends a “finalize” command when user stops speaking to flush the transcription session Error handling : Gracefully handles connection errors and WebSocket exceptions Metrics Support The service supports comprehensive metrics collection: Time to First Byte (TTFB) Processing duration Speech detection events Connection status Notes Requires valid Cartesia API key Supports real-time streaming transcription Handles automatic WebSocket connection management Includes comprehensive error handling Manages connection lifecycle automatically Azure Deepgram On this page Overview Installation Configuration Constructor Parameters CartesiaLiveOptions Default Options Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Model Selection Usage Example Frame Flow Connection Management Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
stt_riva_e59eeddb.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/stt/riva#transcriptionframe-2
|
2 |
+
Title: NVIDIA Riva - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
NVIDIA Riva - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text NVIDIA Riva Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview RivaSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. RivaSegmentedSTTService provides speech-to-text capabilities via NVIDIA’s Riva Canary model. Installation To use RivaSTTService or RivaSegmentedSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[riva]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal . RivaSTTService Configuration api_key str required Your NVIDIA API key server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. sample_rate int default: "None" Audio sample rate in Hz params InputParams default: "InputParams()" Additional configuration parameters InputParams language Language default: "Language.EN_US" The language for speech recognition Input The service processes audio frames containing: Raw PCM audio data 16-bit depth Single channel (mono) Output Frames TranscriptionFrame Generated for final transcriptions, containing: text string Transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Language used for transcription InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. RivaSegmentedSTTService Configuration api_key str required Your NVIDIA API key server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. sample_rate int default: "None" Audio sample rate in Hz params InputParams default: "InputParams()" Additional configuration parameters InputParams language Language default: "Language.EN_US" The language for speech recognition Input The service processes audio frames containing: Raw audio bytes in WAV format Output Frames TranscriptionFrame Generated for final transcriptions, containing: text string Transcribed text user_id string User identifier timestamp string ISO 8601 formatted timestamp language Language Language used for transcription InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. Methods See the STT base class methods for additional functionality. Models Model Pipecat Class Model Card Link parakeet-ctc-1.1b-asr RivaSTTService NVIDIA Model Card canary-1b-asr RivaSegmentedSTTService NVIDIA Model Card Usage Examples RivaSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) RivaSegmentedSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSegmentedSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSegmentedSTTService( api_key = "your-nvidia-api-key" , params = RivaSegmentedSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) Language Support Riva model parakeet-ctc-1.1b-asr (default) primarily supports English with various regional accents: Language Code Description Service Codes Language.EN_US English (US) en-US Frame Flow Advanced Configuration The service supports several advanced configuration options that can be adjusted: _profanity_filter bool default: "False" Filter profanity from transcription _automatic_punctuation bool default: "False" Automatically add punctuation _no_verbatim_transcripts bool default: "False" Whether to disable verbatim transcripts _boosted_lm_words list default: "None" List of words to boost in the language model _boosted_lm_score float default: "4.0" Score applied to boosted words Example with Advanced Configuration Copy Ask AI # Configure service with advanced parameters stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Configure advanced options stt._profanity_filter = True stt._automatic_punctuation = True stt._boosted_lm_words = [ "PipeCat" , "AI" , "speech" ] Notes Uses NVIDIA’s Riva AI Services platform Handles streaming audio input Provides real-time transcription results Manages connection lifecycle Uses asyncio for asynchronous processing Automatically cleans up resources on stop/cancel Groq (Whisper) OpenAI On this page Overview Installation RivaSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame RivaSegmentedSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Models Usage Examples RivaSTTService RivaSegmentedSTTService Language Support Frame Flow Advanced Configuration Example with Advanced Configuration Notes Assistant Responses are generated using AI and may contain mistakes.
|
telephony_daily-webrtc_03c1fa84.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/telephony/daily-webrtc#configuring-your-pipecat-bot
|
2 |
+
Title: Dial-in: WebRTC (Daily) - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Dial-in: WebRTC (Daily) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Telephony Dial-in: WebRTC (Daily) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Things you’ll need An active Daily developer key. One or more Daily provisioned phone numbers (covered below). Prefer to look at code? See the example project! We have a complete dialin-ready project using Daily as both a transport and PSTN/SIP provider in the Pipecat repo. This guide referencse the project and steps through the important parts that make dial-in work. Do I need to provision my phone numbers through Daily? You can use Daily solely as a transport if you prefer. This is particularly useful if you already have Twilio-provisioned numbers and workflows. In that case, you can configure Twilio to forward calls to your Pipecat agents and join a Daily WebRTC call. More details on using Twilio with Daily as a transport can be found here . If you’re starting from scratch, using everything on one platform offers some convenience. By provisioning your phone numbers through Daily and using Daily as the transport layer, you won’t need to worry about initial call routing. Purchasing a phone number You can purchase a number via the Daily REST API Purchase a random number Purchase specific number List numbers Copy Ask AI curl --request POST \ --url 'https://api.daily.co/v1/buy-phone-number' \ --header 'Authorization: Bearer [YOUR_DAILY_API_KEY]' \ --header 'Content-Type: application/json' Configuring your bot runner You’ll need a HTTP service that can receive incoming call hooks and trigger a new agent session. We discussed the concept of a bot runner in the deployment section, which we’ll build on here to add support for incoming phone calls. Within the start_bot method, we’ll need to grab both callId and callDomain from the incoming web request that is triggered by Daily when someone dials the number: bot_daily.py Copy Ask AI # Get the dial-in properties from the request try : data = await request.json() callId = data.get( "callId" ) callDomain = data.get( "callDomain" ) except Exception : raise HTTPException( status_code = 500 , detail = "Missing properties 'callId' or 'callDomain'" ) Full bot source code here Orchestrating incoming calls Daily needs a URL / webhook endpoint it can trigger when a user dials the phone number. We can configure this by assigning the number to an endpoint via their REST API. Here is an example: Copy Ask AI curl --location 'https://api.daily.co/v1' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer [DAILY API TOKEN HERE]' \ --data '{ "properties": { "pinless_dialin": [ { "phone_number": "[DAILY PROVISIONED NUMBER HERE]", "room_creation_api": "[BOT RUNNER URL]/start_bot" } ] } }' If you want to test locally, you can expose your web method using a service such as ngrok . Example ngrok tunnel Copy Ask AI python bot_runner.py --host localhost --port 7860 --reload ngrok http localhost:7860 # E.g: https://123.ngrok.app/start_bot Creating a new SIP-enabled room We’ll need to configure the Daily room to be setup to receive SIP connections. daily-helpers.py included in Pipecat has some useful imports that make this easy. We just need to pass through new SIP parameters as part of room creation: bot_runner.py Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomParams, DailyRoomProperties, DailyRoomSipParams params = DailyRoomParams( properties = DailyRoomProperties( sip = DailyRoomSipParams( display_name = "sip-dialin" video = False sip_mode = "dial-in" num_endpoints = 1 ) ) ) # Create sip-enabled Daily room via REST try : room: DailyRoomObject = daily_rest_helper.create_room( params = params) except Exception as e: raise HTTPException( status_code = 500 , detail = f "Unable to provision room { e } " ) print ( f "Daily room returned { room.url } { room.config.sip_endpoint } " ) Incoming calls will include both callId and callDomain properties in the body of the request; we’ll need to pass to the Pipecat agent. For simplicity, our agents are spawned as sub-processes of the bot runner, so we’ll pass the callId and callDomain through as command line arguments: bot_runner.py Copy Ask AI proc = subprocess.Popen( [ f "python3 -m bot_daily -u { room.url } -t { token } -i { callId } -d { callDomain } " ], shell = True , bufsize = 1 , cwd = os.path.dirname(os.path.abspath( __file__ )) ) That’s all the configuration we need in our bot_runner.py . Configuring your Pipecat bot Let’s take a look at bot_daily.py and step through the differences from other examples. First, it’s setup to receive additional command line parameters which are passed through to the DailyTransport object: bot_daily.py Copy Ask AI # ... async def main ( room_url : str , token : str , callId : str , callDomain : str ): async with aiohttp.ClientSession() as session: diallin_settings = DailyDialinSettings( call_id = callId, call_domain = callDomain ) transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( api_url = daily_api_url, api_key = daily_api_key, dialin_settings = diallin_settings, audio_in_enabled = True , audio_out_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ) ) # ... your bot code if __name__ == "__main__" : parser = argparse.ArgumentParser( description = "Pipecat Simple ChatBot" ) parser.add_argument( "-u" , type = str , help = "Room URL" ) parser.add_argument( "-t" , type = str , help = "Token" ) parser.add_argument( "-i" , type = str , help = "Call ID" ) parser.add_argument( "-d" , type = str , help = "Call Domain" ) config = parser.parse_args() asyncio.run(main(config.u, config.t, config.i, config.d)) Optionally, we can listen and respond to the on_dialin_ready event manually. This is useful if you have specific scenarios in whih you want to indicates that the SIP worker and is ready to be forwarded to the call. This would stop any hold music and connect the end-user to our Pipecat bot. Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , cdata ): print ( f "on_dialin_ready" , cdata) Since we’re using Daily as a phone vendor, this method is handled internally by the Pipecat Daily service. It can, however, be useful to override this default behaviour if you want to configure your bot in a certain way as soon as the call is ready. Typically, however, initial setup is done in the on_first_participant_joined event after the user has joined the session. Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(messages)]) Overview Dial-in: WebRTC (Twilio + Daily) On this page Things you’ll need Do I need to provision my phone numbers through Daily? Purchasing a phone number Configuring your bot runner Orchestrating incoming calls Creating a new SIP-enabled room Configuring your Pipecat bot Assistant Responses are generated using AI and may contain mistakes.
|
transport_daily_3688deee.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/transport/daily#param-on-app-message
|
2 |
+
Title: Daily WebRTC - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . Configuration Constructor Parameters room_url str required Daily room URL token str | None Daily room token bot_name str required Name of the bot in the room params DailyParams default: "DailyParams()" Transport configuration parameters DailyParams Configuration api_url str default: "https://api.daily.co/v1" Daily API endpoint URL api_key str default: "" Daily API key for authentication Audio Output Configuration audio_out_enabled bool default: "False" Enable audio output capabilities audio_out_is_live bool default: "False" Enable live audio streaming mode audio_out_sample_rate int default: "None" Audio output sample rate in Hz audio_out_channels int default: "1" Number of audio output channels audio_out_bitrate int default: "96000" Audio output bitrate in bits per second Audio Input Configuration audio_in_enabled bool default: "False" Enable audio input capabilities audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream audio_in_sample_rate int default: "None" Audio input sample rate in Hz audio_in_channels int default: "1" Number of audio input channels audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. Video Output Configuration video_out_enabled bool default: "False" Enable video output capabilities video_out_is_live bool default: "False" Enable live video streaming mode video_out_width int default: "1024" Video output width in pixels video_out_height int default: "768" Video output height in pixels video_out_bitrate int default: "800000" Video output bitrate in bits per second video_out_framerate int default: "30" Video output frame rate video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) Voice Activity Detection (VAD) vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . Feature Settings dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality transcription_enabled bool default: "False" Enable real-time transcription transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. Connection Events on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) Participant Events on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) Communication Events on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) Dial Events Dial-in on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) Dial-out on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) Transcription Events on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) Recording Events on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) Error Events on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks Frame Types Input Frames InputAudioRawFrame Frame Raw audio data from participants UserImageRawFrame Frame Video frames from participants Output Frames OutputAudioRawFrame Frame Audio data to be sent OutputImageRawFrame Frame Video frames to be sent Methods Room Management participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. Media Control send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send Video Management capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . Transcription Control capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) Recording Control start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. Dial-out Control start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. Subscription Management update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions Chat messages send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. Properties participant_id string Returns the transport’s participant ID in the room. room_url string Returns the transport’s room URL for the room currently in use. Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
|
transport_fastapi-websocket_03bba556.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/transport/fastapi-websocket#param-output-audio-raw-frame
|
2 |
+
Title: FastAPI WebSocket - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
FastAPI WebSocket - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport FastAPI WebSocket Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview FastAPIWebsocketTransport provides WebSocket support for FastAPI web applications, enabling real-time audio communication. It supports bidirectional audio streams and voice activity detection (VAD). FastAPIWebsocketTransport is best suited for server-side applications and prototyping client/server apps. For client/server production applications, we strongly recommend using a WebRTC-based transport for robust network and media handling. Installation To use FastAPIWebsocketTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[websocket]" Configuration Constructor Parameters websocket WebSocket required FastAPI WebSocket connection instance params FastAPIWebsocketParams required Transport configuration parameters FastAPIWebsocketParams Configuration add_wav_header bool default: "False" Add WAV header to audio frames serializer FrameSerializer required Frame serializer for WebSocket messages. Common options include: ExotelFrameSerializer - For Exotel Websocket streaming integration PlivoFrameSerializer - For Plivo Websocket streaming integration TelnyxFrameSerializer - For Telnyx WebSocket streaming integration TwilioFrameSerializer - For Twilio Media Streams integration See the Frame Serializers documentation for more details. session_timeout int | None default: "None" Session timeout in seconds. If set, triggers timeout event when no activity is detected Audio Configuration audio_in_enabled bool default: false Enable audio input from the WebRTC client audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream audio_out_enabled bool default: "False" Enable audio output capabilities audio_out_sample_rate int default: "None" Audio output sample rate in Hz audio_out_channels int default: "1" Number of audio output channels Voice Activity Detection (VAD) vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . Basic Usage Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline from pipecat.serializers.twilio import TwilioFrameSerializer app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Configure transport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = TwilioFrameSerializer(), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio ]) # Run pipeline task = PipelineTask(pipeline) await PipelineRunner().run(task) Check out the Twilio Chatbot example to see how to use the FastAPI transport in a phone application. Event Callbacks FastAPIWebsocketTransport provides callbacks for handling client connection events. Register callbacks using the @transport.event_handler() decorator. Connection Events on_client_connected async callback Called when a client connects to the WebSocket endpoint. Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( "Client connected" ) # Initialize conversation await task.queue_frames([LLMMessagesFrame(initial_messages)]) on_client_disconnected async callback Called when a client disconnects from the WebSocket endpoint. Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( "Client disconnected" ) await task.queue_frames([EndFrame()]) on_session_timeout async callback Called when a session times out (if session_timeout is configured). Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_session_timeout" ) async def on_session_timeout ( transport , client ): logger.info( "Session timeout" ) # Handle timeout (e.g., send message, close connection) Frame Types Input Frames InputAudioRawFrame Frame Raw audio data from the WebSocket client Output Frames OutputAudioRawFrame Frame Audio data to be sent to the WebSocket client Notes Integrates with FastAPI web applications Supports real-time audio communication Handles WebSocket connection management Provides voice activity detection Supports session timeouts All callbacks are asynchronous Compatible with various frame serializers SmallWebRTCTransport WebSocket Server On this page Overview Installation Configuration Constructor Parameters FastAPIWebsocketParams Configuration Audio Configuration Voice Activity Detection (VAD) Basic Usage Event Callbacks Connection Events Frame Types Input Frames Output Frames Notes Assistant Responses are generated using AI and may contain mistakes.
|
transport_fastapi-websocket_dee48b44.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/transport/fastapi-websocket#param-on-client-disconnected
|
2 |
+
Title: FastAPI WebSocket - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
FastAPI WebSocket - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport FastAPI WebSocket Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview FastAPIWebsocketTransport provides WebSocket support for FastAPI web applications, enabling real-time audio communication. It supports bidirectional audio streams and voice activity detection (VAD). FastAPIWebsocketTransport is best suited for server-side applications and prototyping client/server apps. For client/server production applications, we strongly recommend using a WebRTC-based transport for robust network and media handling. Installation To use FastAPIWebsocketTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[websocket]" Configuration Constructor Parameters websocket WebSocket required FastAPI WebSocket connection instance params FastAPIWebsocketParams required Transport configuration parameters FastAPIWebsocketParams Configuration add_wav_header bool default: "False" Add WAV header to audio frames serializer FrameSerializer required Frame serializer for WebSocket messages. Common options include: ExotelFrameSerializer - For Exotel Websocket streaming integration PlivoFrameSerializer - For Plivo Websocket streaming integration TelnyxFrameSerializer - For Telnyx WebSocket streaming integration TwilioFrameSerializer - For Twilio Media Streams integration See the Frame Serializers documentation for more details. session_timeout int | None default: "None" Session timeout in seconds. If set, triggers timeout event when no activity is detected Audio Configuration audio_in_enabled bool default: false Enable audio input from the WebRTC client audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream audio_out_enabled bool default: "False" Enable audio output capabilities audio_out_sample_rate int default: "None" Audio output sample rate in Hz audio_out_channels int default: "1" Number of audio output channels Voice Activity Detection (VAD) vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . Basic Usage Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline from pipecat.serializers.twilio import TwilioFrameSerializer app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Configure transport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = TwilioFrameSerializer(), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio ]) # Run pipeline task = PipelineTask(pipeline) await PipelineRunner().run(task) Check out the Twilio Chatbot example to see how to use the FastAPI transport in a phone application. Event Callbacks FastAPIWebsocketTransport provides callbacks for handling client connection events. Register callbacks using the @transport.event_handler() decorator. Connection Events on_client_connected async callback Called when a client connects to the WebSocket endpoint. Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( "Client connected" ) # Initialize conversation await task.queue_frames([LLMMessagesFrame(initial_messages)]) on_client_disconnected async callback Called when a client disconnects from the WebSocket endpoint. Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( "Client disconnected" ) await task.queue_frames([EndFrame()]) on_session_timeout async callback Called when a session times out (if session_timeout is configured). Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_session_timeout" ) async def on_session_timeout ( transport , client ): logger.info( "Session timeout" ) # Handle timeout (e.g., send message, close connection) Frame Types Input Frames InputAudioRawFrame Frame Raw audio data from the WebSocket client Output Frames OutputAudioRawFrame Frame Audio data to be sent to the WebSocket client Notes Integrates with FastAPI web applications Supports real-time audio communication Handles WebSocket connection management Provides voice activity detection Supports session timeouts All callbacks are asynchronous Compatible with various frame serializers SmallWebRTCTransport WebSocket Server On this page Overview Installation Configuration Constructor Parameters FastAPIWebsocketParams Configuration Audio Configuration Voice Activity Detection (VAD) Basic Usage Event Callbacks Connection Events Frame Types Input Frames Output Frames Notes Assistant Responses are generated using AI and may contain mistakes.
|
transport_small-webrtc_222927a9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/transport/small-webrtc#how-to-connect-with-smallwebrtctransport
|
2 |
+
Title: SmallWebRTCTransport - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SmallWebRTCTransport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport SmallWebRTCTransport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SmallWebRTCTransport enables peer-to-peer WebRTC connections between clients and your Pipecat application. It implements bidirectional audio and video streaming using WebRTC for real-time communication. This transport is intended for lightweight implementations, particularly for local development and testing. It expects your clients to include a corresponding SmallWebRTCTransport implementation. See here for the JavaScript implementation. SmallWebRTCTransport is best used for testing and development. For production deployments with scale, consider using the DailyTransport , as it has global, low-latency infrastructure. Installation To use SmallWebRTCTransport , install the required dependencies: Copy Ask AI pip install pipecat-ai[webrtc] Class Reference SmallWebRTCConnection SmallWebRTCConnection manages the WebRTC connection details, peer connection state, and ICE candidates. It handles the signaling process and media tracks. Copy Ask AI SmallWebRTCConnection( ice_servers = None ) ice_servers Union[List[str], List[IceServer]] List of STUN/TURN server URLs for ICE connection establishment. Can be provided as strings or as IceServer objects. Methods initialize async method Initialize the connection with a client’s SDP offer. Parameters: sdp : String containing the Session Description Protocol data from client’s offer type : String representing the SDP message type (typically “offer”) Copy Ask AI await webrtc_connection.initialize( sdp = client_sdp, type = "offer" ) connect async method Establish the WebRTC peer connection after initialization. Copy Ask AI await webrtc_connection.connect() close async method Close the WebRTC peer connection. Copy Ask AI await webrtc_connection.close() disconnect async method Disconnect the WebRTC peer connection and send a peer left message to the client. Copy Ask AI await webrtc_connection.disconnect() renegotiate async method Handle connection renegotiation requests. Parameters: sdp : String containing the Session Description Protocol data for renegotiation type : String representing the SDP message type restart_pc : Boolean indicating whether to completely restart the peer connection (default: False) Copy Ask AI await webrtc_connection.renegotiate( sdp = new_sdp, type = "offer" , restart_pc = False ) get_answer method Retrieve the SDP answer to send back to the client. Returns a dictionary with sdp , type , and pc_id fields. Copy Ask AI answer = webrtc_connection.get_answer() # Returns: {"sdp": "...", "type": "answer", "pc_id": "..."} send_app_message method Send an application message to the client. Parameters: message : The message to send (will be JSON serialized) Copy Ask AI webrtc_connection.send_app_message({ "action" : "greeting" , "text" : "Hello!" }) is_connected method Check if the connection is active. Copy Ask AI if webrtc_connection.is_connected(): print ( "Connection is active" ) audio_input_track method Get the audio input track from the client. Copy Ask AI audio_track = webrtc_connection.audio_input_track() video_input_track method Get the video input track from the client. Copy Ask AI video_track = webrtc_connection.video_input_track() replace_audio_track method Replace the current audio track with a new one. Parameters: track : The new audio track to use Copy Ask AI webrtc_connection.replace_audio_track(new_audio_track) replace_video_track method Replace the current video track with a new one. Parameters: track : The new video track to use Copy Ask AI webrtc_connection.replace_video_track(new_video_track) ask_to_renegotiate method Request the client to initiate connection renegotiation. Copy Ask AI webrtc_connection.ask_to_renegotiate() event_handler decorator Register an event handler for connection events. Events: "app-message" : Called when a message is received from the client "track-started" : Called when a new track is started "track-ended" : Called when a track ends "connecting" : Called when connection is being established "connected" : Called when connection is established "disconnected" : Called when connection is lost "closed" : Called when connection is closed "failed" : Called when connection fails "new" : Called when a new connection is created Copy Ask AI @webrtc_connection.event_handler ( "connected" ) async def on_connected ( connection ): print ( f "WebRTC connection established" ) SmallWebRTCTransport SmallWebRTCTransport is the main transport class that manages both input and output transports for WebRTC communication. Copy Ask AI SmallWebRTCTransport( webrtc_connection: SmallWebRTCConnection, params: TransportParams, input_name: Optional[ str ] = None , output_name: Optional[ str ] = None ) webrtc_connection SmallWebRTCConnection required An instance of SmallWebRTCConnection that manages the WebRTC connection params TransportParams required Configuration parameters for the transport Show TransportParams properties audio_in_enabled bool default: false Enable audio input from the WebRTC client audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream audio_out_enabled bool default: false Enable audio output to the WebRTC client audio_in_sample_rate int Sample rate for incoming audio (Hz) audio_out_sample_rate int Sample rate for outgoing audio (Hz) audio_in_channels int default: 1 Number of audio input channels (1 for mono, 2 for stereo) audio_out_channels int default: 1 Number of audio output channels (1 for mono, 2 for stereo) video_in_enabled bool default: false Enable video input from the WebRTC client video_out_enabled bool default: false Enable video output to the WebRTC client video_out_width int default: 640 Width of outgoing video video_out_height int default: 480 Height of outgoing video video_out_framerate int default: 30 Framerate of outgoing video vad_analyzer VADAnalyzer Custom VAD analyzer implementation input_name str Optional name for the input transport output_name str Optional name for the output transport Methods input method Returns the input transport instance. Copy Ask AI input_transport = webrtc_transport.input() output method Returns the output transport instance. Copy Ask AI output_transport = webrtc_transport.output() send_image async method Send an image frame to the client. Parameters: frame : The image frame to send (OutputImageRawFrame or SpriteFrame) Copy Ask AI await webrtc_transport.send_image(image_frame) send_audio async method Send an audio frame to the client. Parameters: frame : The audio frame to send (OutputAudioRawFrame) Copy Ask AI await webrtc_transport.send_audio(audio_frame) Event Handlers on_app_message async callback Called when receiving application messages from the client. Parameters: message : The received message Copy Ask AI @webrtc_transport.event_handler ( "on_app_message" ) async def on_app_message ( message ): print ( f "Received message: { message } " ) on_client_connected async callback Called when a client successfully connects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was established Copy Ask AI @webrtc_transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , webrtc_connection ): print ( f "Client connected" ) on_client_disconnected async callback Called when a client disconnects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was disconnected Copy Ask AI @webrtc_transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , webrtc_connection ): print ( f "Client disconnected" ) on_client_closed async callback Called when a client connection is closed. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was closed Copy Ask AI @webrtc_transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , webrtc_connection ): print ( f "Client connection closed" ) Basic Usage This basic usage example shows the transport specific parts of a bot.py file required to configure your bot: Copy Ask AI from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline from pipecat.transports.base_transport import TransportParams from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection async def run_bot ( webrtc_connection ): # Create the WebRTC transport with the provided connection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , # Accept audio from the client audio_out_enabled = True , # Send audio to the client vad_analyzer = SileroVADAnalyzer(), ), ) # Set up your services and context # Create the pipeline pipeline = Pipeline([ transport.input(), # Receive audio from client stt, # Convert speech to text context_aggregator.user(), # Add user messages to context llm, # Process text with LLM tts, # Convert text to speech transport.output(), # Send audio responses to client context_aggregator.assistant(), # Add assistant responses to context ]) # Register event handlers @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( "Client connected" ) # Start the conversation when client connects await task.queue_frames([context_aggregator.user().get_context_frame()]) @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( "Client disconnected" ) @transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , client ): logger.info( "Client closed" ) await task.cancel() How to connect with SmallWebRTCTransport For a client/server connection, you have two options for how to connect the client to the server: Use a Pipecat client SDK with the SmallWebRTCTransport . See the Client SDK docs to get started. Using the WebRTC API directly. This is only recommended for advanced use cases where the Pipecat client SDKs don’t have an available transport. Examples To see a complete implementation, check out the following examples: Video Transform Demonstrates real-time video processing using WebRTC transport Voice Agent Implements a voice assistant using WebRTC for audio communication Media Handling Audio Audio is processed in 20ms chunks by default. The transport handles audio format conversion and resampling as needed: Input audio is processed at 16kHz (mono) to be compatible with speech recognition services Output audio can be configured to match your application’s requirements, but it must be mono, 16-bit PCM audio Video Video is streamed using RGB format by default. The transport provides: Frame conversion between different color formats (RGB, YUV, etc.) Configurable resolution and framerate WebRTC ICE Servers Configuration When implementing WebRTC in your project, STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers are usually needed in cases where users are behind routers or firewalls. In local networks (e.g., testing within the same home or office network), you usually don’t need to configure STUN or TURN servers. In such cases, WebRTC can often directly establish peer-to-peer connections without needing to traverse NAT or firewalls. What are STUN and TURN Servers? STUN Server : Helps clients discover their public IP address and port when they’re behind a NAT (Network Address Translation) device (like a router). This allows WebRTC to attempt direct peer-to-peer communication by providing the public-facing IP and port. TURN Server : Used as a fallback when direct peer-to-peer communication isn’t possible due to strict NATs or firewalls blocking connections. The TURN server relays media traffic between peers. Why are ICE Servers Important? ICE (Interactive Connectivity Establishment) is a framework used by WebRTC to handle network traversal and NAT issues. The iceServers configuration provides a list of STUN and TURN servers that WebRTC uses to find the best way to connect two peers. Advanced Configuration ICE Servers For better connectivity, especially when testing across different networks, you can provide STUN servers: Copy Ask AI webrtc_connection = SmallWebRTCConnection( ice_servers = [ "stun:stun.l.google.com:19302" , "stun:stun1.l.google.com:19302" ] ) You can also use IceServer objects for more advanced configuration: Copy Ask AI from pipecat.transports.network.webrtc_connection import IceServer webrtc_connection = SmallWebRTCConnection( ice_servers = [ IceServer( urls = "stun:stun.l.google.com:19302" ), IceServer( urls = "turn:turn.example.com:3478" , username = "username" , credential = "password" ) ] ) Troubleshooting If clients have trouble connecting or streaming: Check browser console for WebRTC errors Ensure you’re using HTTPS in production (required for WebRTC) For testing across networks, consider using Daily which provides TURN servers Verify browser permissions for camera and microphone Daily WebRTC FastAPI WebSocket On this page Overview Installation Class Reference SmallWebRTCConnection Methods SmallWebRTCTransport Methods Event Handlers Basic Usage How to connect with SmallWebRTCTransport Examples Media Handling Audio Video WebRTC ICE Servers Configuration What are STUN and TURN Servers? Why are ICE Servers Important? Advanced Configuration ICE Servers Troubleshooting Assistant Responses are generated using AI and may contain mistakes.
|
transport_websocket-server_63af9649.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/transport/websocket-server#param-serializer
|
2 |
+
Title: WebSocket Server - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
WebSocket Server - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport WebSocket Server Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview WebsocketServerTransport provides a WebSocket server implementation for real-time audio communication. It supports bidirectional audio streams and voice activity detection (VAD). WebsocketServerTransport is best suited for server-side applications and prototyping client/server apps. For client/server production applications, we strongly recommend using a WebRTC-based transport for robust network and media handling. Installation To use WebsocketServerTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[websocket]" Configuration Constructor Parameters host str default: "localhost" Host address to bind the WebSocket server port int default: "8765" Port number for the WebSocket server params WebsocketServerParams default: "WebsocketServerParams()" Transport configuration parameters WebsocketServerParams Configuration add_wav_header bool default: "False" Add WAV header to audio frames serializer FrameSerializer default: "ProtobufFrameSerializer()" Frame serializer for WebSocket messages session_timeout int | None default: "None" Session timeout in seconds. If set, triggers timeout event when no activity is detected Audio Configuration audio_in_enabled bool default: false Enable audio input from the WebRTC client audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream audio_out_enabled bool default: "False" Enable audio output capabilities audio_out_sample_rate int default: "None" Audio output sample rate in Hz audio_out_channels int default: "1" Number of audio output channels Voice Activity Detection (VAD) vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . Basic Usage Copy Ask AI from pipecat.transports.network.websocket_server import ( WebsocketServerTransport, WebsocketServerParams ) from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline # Configure transport transport = WebsocketServerTransport( host = "localhost" , port = 8765 , params = WebsocketServerParams( audio_in_enabled = True , audio_out_enabled = True , add_wav_header = True , vad_analyzer = SileroVADAnalyzer(), session_timeout = 180 # 3 minutes ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio ]) Check out the Websocket Server example to see how to use this transport in a pipeline. Event Callbacks WebsocketServerTransport provides callbacks for handling client connection events. Register callbacks using the @transport.event_handler() decorator. Connection Events on_client_connected async callback Called when a client connects to the WebSocket server. Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( f "Client connected: { client.remote_address } " ) # Initialize conversation await task.queue_frames([LLMMessagesFrame(initial_messages)]) on_client_disconnected async callback Called when a client disconnects from the WebSocket server. Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( f "Client disconnected: { client.remote_address } " ) on_session_timeout async callback Called when a session times out (if session_timeout is configured). Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_session_timeout" ) async def on_session_timeout ( transport , client ): logger.info( f "Session timeout for client: { client.remote_address } " ) # Handle timeout (e.g., send message, close connection) Frame Types Input Frames InputAudioRawFrame Frame Raw audio data from the WebSocket client Output Frames OutputAudioRawFrame Frame Audio data to be sent to the WebSocket client Notes Supports real-time audio communication Best suited for server-side applications Handles WebSocket connection management Provides voice activity detection Supports session timeouts Single client per server (new connections replace existing ones) All callbacks are asynchronous FastAPI WebSocket Frame Serializer Overview On this page Overview Installation Configuration Constructor Parameters WebsocketServerParams Configuration Audio Configuration Voice Activity Detection (VAD) Basic Usage Event Callbacks Connection Events Frame Types Input Frames Output Frames Notes Assistant Responses are generated using AI and may contain mistakes.
|
transports_gemini-websocket_a7327b57.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/android/transports/gemini-websocket#resources
|
2 |
+
Title: Gemini Live Websocket Transport - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Gemini Live Websocket Transport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport packages Gemini Live Websocket Transport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages Daily WebRTC Transport Gemini Live Websocket Transport OpenAI Realtime WebRTC Transport Small WebRTC Transport C++ SDK SDK Introduction Daily WebRTC Transport The Gemini Live Websocket transport implementation enables real-time audio communication with the Gemini Multimodal Live service, using a direct websocket connection. Transports of this type are designed primarily for development and testing purposes. For production applications, you will need to build a server component with a server-friendly transport, like the DailyTransport , to securely handle API keys. Installation Add the transport dependency to your build.gradle : Copy Ask AI implementation "ai.pipecat:gemini-live-websocket-transport:0.3.7" Usage Create a client: Copy Ask AI val transport = GeminiLiveWebsocketTransport. Factory (context) val options = RTVIClientOptions ( params = RTVIClientParams ( baseUrl = null , config = GeminiLiveWebsocketTransport. buildConfig ( apiKey = "<your Gemini api key>" , generationConfig = Value. Object ( "speech_config" to Value. Object ( "voice_config" to Value. Object ( "prebuilt_voice_config" to Value. Object ( "voice_name" to Value. Str ( "Puck" ) ) ) ) ), initialUserMessage = "How tall is the Eiffel Tower?" ) ) ) val client = RTVIClient (transport, callbacks, options) client. start (). withCallback { // ... } Resources Demo Simple Chatbot Demo Source Client Transports Pipecat Android Client Reference Complete API documentation for the Pipecat Android client. Daily WebRTC Transport OpenAI Realtime WebRTC Transport On this page Installation Usage Resources Assistant Responses are generated using AI and may contain mistakes.
|
tts_aws_0db50b6f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/aws#param-text-filter
|
2 |
+
Title: AWS Polly - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AWS Polly - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech AWS Polly Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview AWSPollyTTSService provides text-to-speech capabilities using AWS’s Polly service. It supports multiple voices, languages, and speech customization options through SSML. The older PollyTTSService class is still available but has been deprecated. Use AWSPollyTTSService instead. Installation To use AWSPollyTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) Configuration Constructor Parameters api_key str AWS secret access key (can also use environment variable) aws_access_key_id str AWS access key ID (can also use environment variable) aws_session_token str AWS session token for temporary credentials (can also use environment variable) region str AWS region name (defaults to “us-east-1” if not provided) voice_id str default: "Joanna" AWS Polly voice identifier sample_rate int default: "None" Output audio sample rate in Hz (resampled from Polly’s 16kHz) text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. params InputParams TTS configuration parameters Input Parameters Copy Ask AI class InputParams ( BaseModel ): engine: Optional[ str ] = None # Polly engine type ("standard", "neural", or "generative") language: Optional[Language] = Language. EN pitch: Optional[ str ] = None # SSML pitch adjustment rate: Optional[ str ] = None # SSML rate adjustment volume: Optional[ str ] = None # SSML volume adjustment Output Frames Control Frames TTSStartedFrame Frame Signals start of speech synthesis TTSStoppedFrame Frame Signals completion of speech synthesis Audio Frames TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified (resampled from 16kHz) Single channel (mono) Error Frames ErrorFrame Frame Contains AWS Polly error information Methods See the TTS base class methods for additional functionality. Language Support Supports an extensive range of languages and regional variants: Language Code Description Service Code Language.AR Arabic arb Language.AR_AE Arabic (UAE) ar-AE Language.CA Catalan ca-ES Language.ZH Chinese (Mandarin) cmn-CN Language.YUE Chinese (Cantonese) yue-CN Language.YUE_CN Chinese (Cantonese) yue-CN Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.EN English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.EN_NZ English (New Zealand) en-NZ Language.EN_US English (US) en-US Language.EN_ZA English (South Africa) en-ZA Language.FI Finnish fi-FI Language.FR French fr-FR Language.FR_BE French (Belgium) fr-BE Language.FR_CA French (Canada) fr-CA Language.DE German de-DE Language.DE_AT German (Austria) de-AT Language.DE_CH German (Switzerland) de-CH Language.HI Hindi hi-IN Language.IS Icelandic is-IS Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.NO Norwegian nb-NO Language.NB Norwegian (Bokmål) nb-NO Language.NB_NO Norwegian (Bokmål) nb-NO Language.PL Polish pl-PL Language.PT Portuguese pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.PT_PT Portuguese (Portugal) pt-PT Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.ES Spanish es-ES Language.ES_MX Spanish (Mexico) es-MX Language.ES_US Spanish (US) es-US Language.SV Swedish sv-SE Language.TR Turkish tr-TR Language.CY Welsh cy-GB Language.CY_GB Welsh cy-GB Usage Example Copy Ask AI from pipecat.services.aws.tts import AWSPollyTTSService from pipecat.transcriptions.language import Language # Configure service using environment variables for credentials tts = AWSPollyTTSService( region = "us-west-2" , voice_id = "Joanna" , params = AWSPollyTTSService.InputParams( engine = "neural" , language = Language. EN , rate = "+10%" , volume = "loud" ) ) # Or provide credentials directly tts = AWSPollyTTSService( aws_access_key_id = "YOUR_ACCESS_KEY_ID" , api_key = "YOUR_SECRET_ACCESS_KEY" , region = "us-west-2" , voice_id = "Joanna" , params = AWSPollyTTSService.InputParams( engine = "generative" , # For newer generative voices language = Language. EN , rate = "1.1" # Generative engine rate format ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) SSML Support The service automatically constructs SSML tags for advanced speech control: Copy Ask AI # Example with SSML controls service = AWSPollyTTSService( # ... other params ... params = AWSPollyTTSService.InputParams( engine = "neural" , rate = "+20%" , # Increase speed pitch = "low" , # Lower pitch volume = "loud" # Increase volume ) ) Prosody tags (pitch, rate, volume) have different behaviors based on the engine: - Standard engine: Supports all prosody tags - Neural engine: Full prosody support - Generative engine: Only rate is supported, with a different format (e.g., “1.1” for 10% faster) Frame Flow Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls Notes Supports all AWS Polly engines: Standard (non-neural voices) Neural (improved quality voices) Generative (high-quality, natural-sounding voices) Automatic audio resampling from 16kHz to any desired rate Thread-safe processing Automatic error handling Manages AWS client lifecycle Together AI Azure On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example SSML Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_aws_32c7001f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/aws#param-tts-stopped-frame
|
2 |
+
Title: AWS Polly - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AWS Polly - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech AWS Polly Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview AWSPollyTTSService provides text-to-speech capabilities using AWS’s Polly service. It supports multiple voices, languages, and speech customization options through SSML. The older PollyTTSService class is still available but has been deprecated. Use AWSPollyTTSService instead. Installation To use AWSPollyTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) Configuration Constructor Parameters api_key str AWS secret access key (can also use environment variable) aws_access_key_id str AWS access key ID (can also use environment variable) aws_session_token str AWS session token for temporary credentials (can also use environment variable) region str AWS region name (defaults to “us-east-1” if not provided) voice_id str default: "Joanna" AWS Polly voice identifier sample_rate int default: "None" Output audio sample rate in Hz (resampled from Polly’s 16kHz) text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. params InputParams TTS configuration parameters Input Parameters Copy Ask AI class InputParams ( BaseModel ): engine: Optional[ str ] = None # Polly engine type ("standard", "neural", or "generative") language: Optional[Language] = Language. EN pitch: Optional[ str ] = None # SSML pitch adjustment rate: Optional[ str ] = None # SSML rate adjustment volume: Optional[ str ] = None # SSML volume adjustment Output Frames Control Frames TTSStartedFrame Frame Signals start of speech synthesis TTSStoppedFrame Frame Signals completion of speech synthesis Audio Frames TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified (resampled from 16kHz) Single channel (mono) Error Frames ErrorFrame Frame Contains AWS Polly error information Methods See the TTS base class methods for additional functionality. Language Support Supports an extensive range of languages and regional variants: Language Code Description Service Code Language.AR Arabic arb Language.AR_AE Arabic (UAE) ar-AE Language.CA Catalan ca-ES Language.ZH Chinese (Mandarin) cmn-CN Language.YUE Chinese (Cantonese) yue-CN Language.YUE_CN Chinese (Cantonese) yue-CN Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.EN English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.EN_NZ English (New Zealand) en-NZ Language.EN_US English (US) en-US Language.EN_ZA English (South Africa) en-ZA Language.FI Finnish fi-FI Language.FR French fr-FR Language.FR_BE French (Belgium) fr-BE Language.FR_CA French (Canada) fr-CA Language.DE German de-DE Language.DE_AT German (Austria) de-AT Language.DE_CH German (Switzerland) de-CH Language.HI Hindi hi-IN Language.IS Icelandic is-IS Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.NO Norwegian nb-NO Language.NB Norwegian (Bokmål) nb-NO Language.NB_NO Norwegian (Bokmål) nb-NO Language.PL Polish pl-PL Language.PT Portuguese pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.PT_PT Portuguese (Portugal) pt-PT Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.ES Spanish es-ES Language.ES_MX Spanish (Mexico) es-MX Language.ES_US Spanish (US) es-US Language.SV Swedish sv-SE Language.TR Turkish tr-TR Language.CY Welsh cy-GB Language.CY_GB Welsh cy-GB Usage Example Copy Ask AI from pipecat.services.aws.tts import AWSPollyTTSService from pipecat.transcriptions.language import Language # Configure service using environment variables for credentials tts = AWSPollyTTSService( region = "us-west-2" , voice_id = "Joanna" , params = AWSPollyTTSService.InputParams( engine = "neural" , language = Language. EN , rate = "+10%" , volume = "loud" ) ) # Or provide credentials directly tts = AWSPollyTTSService( aws_access_key_id = "YOUR_ACCESS_KEY_ID" , api_key = "YOUR_SECRET_ACCESS_KEY" , region = "us-west-2" , voice_id = "Joanna" , params = AWSPollyTTSService.InputParams( engine = "generative" , # For newer generative voices language = Language. EN , rate = "1.1" # Generative engine rate format ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) SSML Support The service automatically constructs SSML tags for advanced speech control: Copy Ask AI # Example with SSML controls service = AWSPollyTTSService( # ... other params ... params = AWSPollyTTSService.InputParams( engine = "neural" , rate = "+20%" , # Increase speed pitch = "low" , # Lower pitch volume = "loud" # Increase volume ) ) Prosody tags (pitch, rate, volume) have different behaviors based on the engine: - Standard engine: Supports all prosody tags - Neural engine: Full prosody support - Generative engine: Only rate is supported, with a different format (e.g., “1.1” for 10% faster) Frame Flow Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls Notes Supports all AWS Polly engines: Standard (non-neural voices) Neural (improved quality voices) Generative (high-quality, natural-sounding voices) Automatic audio resampling from 16kHz to any desired rate Thread-safe processing Automatic error handling Manages AWS client lifecycle Together AI Azure On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example SSML Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_elevenlabs_d2c244cd.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/elevenlabs#param-url
|
2 |
+
Title: ElevenLabs - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
ElevenLabs - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech ElevenLabs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview ElevenLabs TTS provides high-quality text-to-speech synthesis through two service implementations: ElevenLabsTTSService : WebSocket-based implementation with word-level timing and interruption support ElevenLabsHttpTTSService : HTTP-based implementation for simpler use cases Installation To use ElevenLabsTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[elevenlabs]" You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY . You can obtain a ElevenLabs API key by signing up at ElevenLabs . ElevenLabsTTSService (WebSocket) Configuration api_key str required ElevenLabs API key voice_id str required Voice identifier model str default: "eleven_flash_v2_5" Model identifier url str default: "wss://api.elevenlabs.io" API endpoint URL sample_rate int default: "None" Output audio sample rate in Hz params InputParams default: "InputParams()" Additional configuration parameters text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. InputParams language Language default: "None" The language of the text to be synthesized optimize_streaming_latency str default: "None" Optimization level for streaming latency stability float default: "None" Defines the stability for voice settings similarity_boost float default: "None" Defines the similarity boost for voice settings style float default: "None" Defines the style for voice settings. Available on V2+ models use_speaker_boost bool default: "None" Defines whether to use speaker boost for voice settings. Available on V2+ models speed float default: "None" Speech rate multiplier. Higher values increase speech speed auto_mode bool default: "True" This parameter focuses on reducing the latency by disabling the chunk schedule and buffers. Recommended when sending full sentences or phrases ElevenLabsHttpTTSService (HTTP) Configuration api_key str required ElevenLabs API key voice_id str required Voice identifier aiohttp_session aiohttp.ClientSession required aiohttp ClientSession for HTTP requests model str default: "eleven_flash_v2_5" Model identifier base_url str default: "https://api.elevenlabs.io" API base URL sample_rate int default: "None" Output audio sample rate in Hz params InputParams default: "InputParams()" Additional configuration parameters (similar to WebSocket implementation) Output Frames TTSStartedFrame Signals the start of audio generation. TTSAudioRawFrame Contains generated audio data: audio bytes Raw audio data chunk sample_rate int Audio sample rate num_channels int Number of audio channels (1 for mono) TTSStoppedFrame Signals the completion of audio generation. ErrorFrame (HTTP implementation) Sent when an error occurs during HTTP TTS generation: error str Error message describing what went wrong Usage Examples Basic Usage Copy Ask AI # Configure service tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , sample_rate = 24000 , params = ElevenLabsTTSService.InputParams( language = Language. EN ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output() ]) With Voice Settings Copy Ask AI # Configure with voice customization tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( stability = 0.7 , similarity_boost = 0.8 , style = 0.5 , use_speaker_boost = True ) ) Methods See the TTS base class methods for additional functionality. Language Support ElevenLabs supports the following languages and their variants: Language Code Description Service Code Language.AR Arabic ar Language.BG Bulgarian bg Language.CS Czech cs Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.FI Finnish fi Language.FIL Filipino fil Language.FR French fr Language.HI Hindi hi Language.HR Croatian hr Language.HU Hungarian hu Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.MS Malay ms Language.NL Dutch nl Language.NO Norwegian no Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SK Slovak sk Language.SV Swedish sv Language.TA Tamil ta Language.TR Turkish tr Language.UK Ukrainian uk Language.VI Vietnamese vi Language.ZH Chinese zh Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details. Usage Example Copy Ask AI # Configure service with specific language service = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( language = Language. FR # French ) ) Frame Flow Notes WebSocket implementation includes a 10-second keepalive mechanism Sample rate must be one of: 16000, 22050, 24000, or 44100 Hz Voice settings require both stability and similarity_boost to be set The language parameter only works with multilingual models WebSocket implementation pauses frame processing during speech generation HTTP implementation requires an external aiohttp ClientSession Deepgram Fish Audio On this page Overview Installation ElevenLabsTTSService (WebSocket) Configuration InputParams ElevenLabsHttpTTSService (HTTP) Configuration Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame (HTTP implementation) Usage Examples Basic Usage With Voice Settings Methods Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_elevenlabs_dfd65e53.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/elevenlabs#param-similarity-boost
|
2 |
+
Title: ElevenLabs - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
ElevenLabs - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech ElevenLabs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview ElevenLabs TTS provides high-quality text-to-speech synthesis through two service implementations: ElevenLabsTTSService : WebSocket-based implementation with word-level timing and interruption support ElevenLabsHttpTTSService : HTTP-based implementation for simpler use cases Installation To use ElevenLabsTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[elevenlabs]" You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY . You can obtain a ElevenLabs API key by signing up at ElevenLabs . ElevenLabsTTSService (WebSocket) Configuration api_key str required ElevenLabs API key voice_id str required Voice identifier model str default: "eleven_flash_v2_5" Model identifier url str default: "wss://api.elevenlabs.io" API endpoint URL sample_rate int default: "None" Output audio sample rate in Hz params InputParams default: "InputParams()" Additional configuration parameters text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. InputParams language Language default: "None" The language of the text to be synthesized optimize_streaming_latency str default: "None" Optimization level for streaming latency stability float default: "None" Defines the stability for voice settings similarity_boost float default: "None" Defines the similarity boost for voice settings style float default: "None" Defines the style for voice settings. Available on V2+ models use_speaker_boost bool default: "None" Defines whether to use speaker boost for voice settings. Available on V2+ models speed float default: "None" Speech rate multiplier. Higher values increase speech speed auto_mode bool default: "True" This parameter focuses on reducing the latency by disabling the chunk schedule and buffers. Recommended when sending full sentences or phrases ElevenLabsHttpTTSService (HTTP) Configuration api_key str required ElevenLabs API key voice_id str required Voice identifier aiohttp_session aiohttp.ClientSession required aiohttp ClientSession for HTTP requests model str default: "eleven_flash_v2_5" Model identifier base_url str default: "https://api.elevenlabs.io" API base URL sample_rate int default: "None" Output audio sample rate in Hz params InputParams default: "InputParams()" Additional configuration parameters (similar to WebSocket implementation) Output Frames TTSStartedFrame Signals the start of audio generation. TTSAudioRawFrame Contains generated audio data: audio bytes Raw audio data chunk sample_rate int Audio sample rate num_channels int Number of audio channels (1 for mono) TTSStoppedFrame Signals the completion of audio generation. ErrorFrame (HTTP implementation) Sent when an error occurs during HTTP TTS generation: error str Error message describing what went wrong Usage Examples Basic Usage Copy Ask AI # Configure service tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , sample_rate = 24000 , params = ElevenLabsTTSService.InputParams( language = Language. EN ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output() ]) With Voice Settings Copy Ask AI # Configure with voice customization tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( stability = 0.7 , similarity_boost = 0.8 , style = 0.5 , use_speaker_boost = True ) ) Methods See the TTS base class methods for additional functionality. Language Support ElevenLabs supports the following languages and their variants: Language Code Description Service Code Language.AR Arabic ar Language.BG Bulgarian bg Language.CS Czech cs Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.FI Finnish fi Language.FIL Filipino fil Language.FR French fr Language.HI Hindi hi Language.HR Croatian hr Language.HU Hungarian hu Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.MS Malay ms Language.NL Dutch nl Language.NO Norwegian no Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SK Slovak sk Language.SV Swedish sv Language.TA Tamil ta Language.TR Turkish tr Language.UK Ukrainian uk Language.VI Vietnamese vi Language.ZH Chinese zh Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details. Usage Example Copy Ask AI # Configure service with specific language service = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( language = Language. FR # French ) ) Frame Flow Notes WebSocket implementation includes a 10-second keepalive mechanism Sample rate must be one of: 16000, 22050, 24000, or 44100 Hz Voice settings require both stability and similarity_boost to be set The language parameter only works with multilingual models WebSocket implementation pauses frame processing during speech generation HTTP implementation requires an external aiohttp ClientSession Deepgram Fish Audio On this page Overview Installation ElevenLabsTTSService (WebSocket) Configuration InputParams ElevenLabsHttpTTSService (HTTP) Configuration Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame (HTTP implementation) Usage Examples Basic Usage With Voice Settings Methods Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_google_7e5164eb.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/google#param-voice-id
|
2 |
+
Title: Google - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Google - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Google Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GoogleTTSService provides high-quality text-to-speech synthesis using Google Cloud’s Text-to-Speech API. It supports SSML for advanced voice control and multiple languages. Installation To use GoogleTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll also need to set up Google Cloud credentials through either: Environment variable: GOOGLE_APPLICATION_CREDENTIALS Direct credentials JSON Credentials file path Configuration Constructor Parameters credentials str | None Google Cloud credentials JSON string credentials_path str | None Path to credentials JSON file voice_id str default: "en-US-Neural2-A" Voice identifier sample_rate int default: "None" Output audio sample rate in Hz text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. Input Parameters Copy Ask AI class InputParams ( BaseModel ): pitch: Optional[ str ] rate: Optional[ str ] volume: Optional[ str ] emphasis: Optional[Literal[ "strong" , "moderate" , "reduced" , "none" ]] language: Optional[Language] = Language. EN gender: Optional[Literal[ "male" , "female" , "neutral" ]] google_style: Optional[Literal[ "apologetic" , "calm" , "empathetic" , "firm" , "lively" ]] Output Frames Control Frames TTSStartedFrame Frame Signals start of synthesis TTSStoppedFrame Frame Signals completion of synthesis Audio Frames TTSAudioRawFrame Frame Contains generated audio data: - PCM encoded audio - Configured sample rate - Mono channel Error Frames ErrorFrame Frame Contains error information Usage Examples Basic Usage Copy Ask AI # Configure service tts = GoogleTTSService( credentials_path = "path/to/credentials.json" , voice_id = "en-US-Neural2-A" , params = GoogleTTSService.InputParams( language = Language. EN , gender = "female" , google_style = "empathetic" ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) With SSML Controls Copy Ask AI # Configure with voice controls service = GoogleTTSService( credentials = credentials_json, params = GoogleTTSService.InputParams( pitch = "+2st" , rate = "1.2" , volume = "loud" , emphasis = "moderate" ) ) Methods See the TTS base class methods for additional functionality. Language Support Google Cloud Text-to-Speech supports the following languages and regional variants: Language Code Description Service Code Language.BG Bulgarian bg-BG Language.CA Catalan ca-ES Language.ZH Chinese (Mandarin) cmn-CN Language.ZH_TW Chinese (Taiwan) cmn-TW Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch (Netherlands) nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.EN English (US) en-US Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ET Estonian et-EE Language.FI Finnish fi-FI Language.FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.DE German de-DE Language.EL Greek el-GR Language.HI Hindi hi-IN Language.HU Hungarian hu-HU Language.ID Indonesian id-ID Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.LV Latvian lv-LV Language.LT Lithuanian lt-LT Language.MS Malay ms-MY Language.NO Norwegian nb-NO Language.PL Polish pl-PL Language.PT Portuguese (Portugal) pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.SK Slovak sk-SK Language.ES Spanish es-ES Language.SV Swedish sv-SE Language.TH Thai th-TH Language.TR Turkish tr-TR Language.UK Ukrainian uk-UA Language.VI Vietnamese vi-VN Usage Example Copy Ask AI # Configure service with specific language and region service = GoogleTTSService( credentials_path = "path/to/credentials.json" , voice_id = "en-US-Neural2-A" , params = GoogleTTSService.InputParams( language = Language. EN_GB , # British English gender = "female" ) ) Regional Considerations Each language code includes both language and region (e.g., fr-FR for French in France) Some languages have multiple regional variants (e.g., English has US, UK, Australian, and Indian variants) Voice availability may vary by region Neural voices may not be available for all language/region combinations Note: Voice selection should match the specified language code for optimal results. Frame Flow Notes Supports SSML markup Multiple voice styles Gender selection Prosody control Emphasis levels Regional language variants Metrics collection Chunked audio output Thread-safe processing Fish Audio Groq On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Usage Examples Basic Usage With SSML Controls Methods Language Support Usage Example Regional Considerations Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_groq_ec887456.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/groq#language-support
|
2 |
+
Title: Groq - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Groq - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Groq Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview GroqTTSService converts text to speech using Groq’s TTS API. It supports real-time audio generation with multiple voices. Installation To use GroqTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[groq]" You’ll also need to set up your Groq API key as an environment variable: GROQ_API_KEY . You can obtain a Groq Cloud API key by signing up at Groq . Configuration Constructor Parameters api_key str required Your Groq API key output_format str default: "wav" Audio output format params InputParams default: "InputParams()" Configuration parameters for speech generation model_name str default: "playai-tts" TTS model to use. See the Groq Cloud docs for available models . voice_id str default: "Celeste-PlayAI" Voice identifier to use for synthesis Input Parameters language Language default: "Language.EN" Language for speech synthesis speed float default: "1.0" Speech rate multiplier (higher values produce faster speech) seed Optional[int] default: "None" Random seed for reproducible audio generation Input The service accepts text input through the pipeline, including streaming text from an LLM service. Output Frames TTSStartedFrame Signals the start of audio generation. TTSAudioRawFrame Contains generated audio data: audio bytes Raw audio data chunk sample_rate int Audio sample rate, based on the constructor setting num_channels int Number of audio channels (1 for mono) TTSStoppedFrame Signals the completion of audio generation. Methods See the TTS base class methods for additional functionality. Language Support GroqTTSService supports the following languages: Language Code Description Service Codes Language.EN English en Usage Example Copy Ask AI from pipecat.services.groq.tts import GroqTTSService from pipecat.transcriptions.language import Language # Configure service tts = GroqTTSService( api_key = "your-api-key" , model_name = "playai-tts" , voice_id = "Celeste-PlayAI" , params = GroqTTSService.InputParams( language = Language. EN , speed = 1.0 , seed = 42 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) Frame Flow Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Processing duration Audio Processing Streams audio in chunks Outputs mono audio at the defined sample rate Handles WAV header removal automatically Supports WAV format by default Notes Requires a Groq Cloud API key Streams audio in chunks for efficient processing Automatically handles WAV headers in the response Provides metrics collection Supports configurable speech parameters Google LMNT On this page Overview Installation Configuration Constructor Parameters Input Parameters Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_minimax_aef6f242.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/minimax#frame-flow
|
2 |
+
Title: MiniMax - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
MiniMax - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech MiniMax Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview MiniMaxHttpTTSService provides text-to-speech capabilities using MiniMax’s T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options. Installation To use MiniMaxHttpTTSService , no additional dependencies are required. You’ll also need MiniMax API credentials (API key and Group ID). Configuration Constructor Parameters api_key str required MiniMax API key for authentication group_id str required MiniMax Group ID to identify your project model str default: "speech-02-turbo" MiniMax TTS model to use. Available options include: speech-02-hd : HD model with superior rhythm and stability speech-02-turbo : Turbo model with enhanced multilingual capabilities speech-01-hd : Rich voices with expressive emotions speech-01-turbo : Low-latency model with regular updates voice_id str default: "Calm_Woman" MiniMax voice identifier. Options include: Wise_Woman Friendly_Person Inspirational_girl Deep_Voice_Man Calm_Woman Casual_Guy Lively_Girl Patient_Man Young_Knight Determined_Man Lovely_Girl Decent_Boy Imposing_Manner Elegant_Man Abbess Sweet_Girl_2 Exuberant_Girl See the MiniMax documentation for a complete list of available voices. aiohttp_session aiohttp.ClientSession required Aiohttp session for API communication sample_rate int default: "None" Output audio sample rate in Hz params InputParams TTS configuration parameters Input Parameters language Language default: "Language.EN" Language for TTS generation speed float default: "1.0" Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less than 1.0 decrease speed. volume float default: "1.0" Speech volume (range: 0 to 10). Values greater than 1.0 increase volume. pitch float default: "0" Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative values lower pitch. emotion str Emotional tone of the speech. Options include: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, and “neutral”. english_normalization bool Whether to apply English text normalization, which improves performance in number-reading scenarios at the cost of slightly increased latency. Output Frames Control Frames TTSStartedFrame Frame Signals start of speech synthesis TTSStoppedFrame Frame Signals completion of speech synthesis Audio Frames TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified Single channel (mono) Error Frames ErrorFrame Frame Contains MiniMax API error information Methods See the TTS base class methods for additional functionality. Language Support Supports a wide range of languages through the language_boost parameter: Language Code Service Code Description Language.AR Arabic Arabic Language.CS Czech Czech Language.DE German German Language.EL Greek Greek Language.EN English English Language.ES Spanish Spanish Language.FI Finnish Finnish Language.FR French French Language.HI Hindi Hindi Language.ID Indonesian Indonesian Language.IT Italian Italian Language.JA Japanese Japanese Language.KO Korean Korean Language.NL Dutch Dutch Language.PL Polish Polish Language.PT Portuguese Portuguese Language.RO Romanian Romanian Language.RU Russian Russian Language.TH Thai Thai Language.TR Turkish Turkish Language.UK Ukrainian Ukrainian Language.VI Vietnamese Vietnamese Language.YUE Chinese,Yue Chinese (Cantonese) Language.ZH Chinese Chinese (Mandarin) Usage Example Copy Ask AI import aiohttp import os from pipecat.services.minimax.tts import MiniMaxHttpTTSService from pipecat.transcriptions.language import Language async def create_tts_service (): # Create an HTTP session session = aiohttp.ClientSession() # Configure service with credentials tts = MiniMaxHttpTTSService( api_key = os.getenv( "MINIMAX_API_KEY" ), group_id = os.getenv( "MINIMAX_GROUP_ID" ), model = "speech-02-turbo" , voice_id = "Patient_Man" , aiohttp_session = session, params = MiniMaxHttpTTSService.InputParams( language = Language. EN , speed = 1.1 , # Slightly faster speech volume = 1.2 , # Slightly louder pitch = 0 , # Default pitch emotion = "neutral" # Neutral emotional tone ) ) return tts # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) Frame Flow Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage Notes Uses streaming audio generation for faster initial response Processes audio in chunks for efficient memory usage Supports real-time applications with low latency Automatically handles API authentication Provides PCM audio compatible with most audio pipelines LMNT Neuphonic On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_neuphonic_5f36257f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/neuphonic#param-params-1
|
2 |
+
Title: Neuphonic - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Neuphonic - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Neuphonic Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview Neuphonic provides high-quality text-to-speech synthesis through two service implementations: NeuphonicTTSService : WebSocket-based implementation with interruption support NeuphonicHttpTTSService : HTTP-based implementation for simpler use cases Both services support various voices, languages, and customization options. Installation To use Neuphonic TTS services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[neuphonic]" You’ll also need to set up your Neuphonic API key as an environment variable: NEUPHONIC_API_KEY NeuphonicTTSService (WebSocket) Configuration api_key str required Your Neuphonic API key voice_id str default: "None" Voice identifier to use for synthesis url str default: "wss://api.neuphonic.com" Neuphonic WebSocket API endpoint sample_rate int default: "22050" Output audio sample rate in Hz encoding str default: "pcm_linear" Audio encoding format params InputParams default: "InputParams()" Additional configuration parameters InputParams language Language default: "Language.EN" The language for TTS generation speed float default: "1.0" Speech speed multiplier (0.5-2.0) NeuphonicHttpTTSService (HTTP) Configuration api_key str required Your Neuphonic API key voice_id str default: "None" Voice identifier to use for synthesis url str default: "https://api.neuphonic.com" Neuphonic HTTP API endpoint sample_rate int default: "22050" Output audio sample rate in Hz encoding str default: "pcm_linear" Audio encoding format params InputParams default: "InputParams()" Additional configuration parameters (same as WebSocket implementation) Input Both services accept text input through their TTS pipeline. Output Frames TTSStartedFrame Signals the start of audio generation. TTSAudioRawFrame Contains generated audio data: audio bytes Raw audio data chunk sample_rate int Audio sample rate (22050Hz default) num_channels int Number of audio channels (1 for mono) TTSStoppedFrame Signals the completion of audio generation. ErrorFrame Sent when an error occurs during TTS generation: error str Error message describing what went wrong Methods WebSocket Implementation The WebSocket implementation ( NeuphonicTTSService ) inherits from InterruptibleTTSService and provides: Support for interrupting ongoing TTS generation Automatic websocket connection management Keep-alive mechanism for persistent connections Special handling for conversation flows HTTP Implementation The HTTP implementation ( NeuphonicHttpTTSService ) inherits from TTSService and provides: Simpler API integration using HTTP streaming Less overhead for single TTS requests Simplified error handling Language Support Neuphonic TTS supports the following languages: Language Code Description Service Codes Language.EN English en Language.ES Spanish es Language.DE German de Language.NL Dutch nl Language.AR Arabic ar Language.FR French fr Language.PT Portuguese pt Language.RU Russian ru Language.HI Hindi hi Language.ZH Chinese zh Regional variants (e.g., EN_US , ES_ES ) are automatically mapped to their base language. Usage Example WebSocket Implementation Copy Ask AI from pipecat.services.neuphonic.tts import NeuphonicTTSService from pipecat.transcriptions.language import Language # Configure service tts = NeuphonicTTSService( api_key = "your-neuphonic-api-key" , voice_id = "preferred-voice-id" , params = NeuphonicTTSService.InputParams( language = Language. EN , speed = 1.2 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) HTTP Implementation Copy Ask AI from pipecat.services.neuphonic.tts import NeuphonicHttpTTSService from pipecat.transcriptions.language import Language # Configure service tts = NeuphonicHttpTTSService( api_key = "your-neuphonic-api-key" , voice_id = "preferred-voice-id" , params = NeuphonicHttpTTSService.InputParams( language = Language. ES , speed = 1.0 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) Metrics Support Both services support metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration Audio Processing Configurable sample rate (defaults to 22050Hz) PCM linear encoding Single channel (mono) output Base64 decoding for audio data Notes WebSocket implementation includes a keep-alive mechanism (10-second interval) WebSocket service maintains a persistent connection for faster responses Both services automatically select appropriate language codes The WebSocket implementation pauses frame processing during speech generation to prevent overlapping responses MiniMax NVIDIA Riva On this page Overview Installation NeuphonicTTSService (WebSocket) Configuration InputParams NeuphonicHttpTTSService (HTTP) Configuration Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame Methods WebSocket Implementation HTTP Implementation Language Support Usage Example WebSocket Implementation HTTP Implementation Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_neuphonic_792be297.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/neuphonic#param-voice-id
|
2 |
+
Title: Neuphonic - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Neuphonic - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Neuphonic Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview Neuphonic provides high-quality text-to-speech synthesis through two service implementations: NeuphonicTTSService : WebSocket-based implementation with interruption support NeuphonicHttpTTSService : HTTP-based implementation for simpler use cases Both services support various voices, languages, and customization options. Installation To use Neuphonic TTS services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[neuphonic]" You’ll also need to set up your Neuphonic API key as an environment variable: NEUPHONIC_API_KEY NeuphonicTTSService (WebSocket) Configuration api_key str required Your Neuphonic API key voice_id str default: "None" Voice identifier to use for synthesis url str default: "wss://api.neuphonic.com" Neuphonic WebSocket API endpoint sample_rate int default: "22050" Output audio sample rate in Hz encoding str default: "pcm_linear" Audio encoding format params InputParams default: "InputParams()" Additional configuration parameters InputParams language Language default: "Language.EN" The language for TTS generation speed float default: "1.0" Speech speed multiplier (0.5-2.0) NeuphonicHttpTTSService (HTTP) Configuration api_key str required Your Neuphonic API key voice_id str default: "None" Voice identifier to use for synthesis url str default: "https://api.neuphonic.com" Neuphonic HTTP API endpoint sample_rate int default: "22050" Output audio sample rate in Hz encoding str default: "pcm_linear" Audio encoding format params InputParams default: "InputParams()" Additional configuration parameters (same as WebSocket implementation) Input Both services accept text input through their TTS pipeline. Output Frames TTSStartedFrame Signals the start of audio generation. TTSAudioRawFrame Contains generated audio data: audio bytes Raw audio data chunk sample_rate int Audio sample rate (22050Hz default) num_channels int Number of audio channels (1 for mono) TTSStoppedFrame Signals the completion of audio generation. ErrorFrame Sent when an error occurs during TTS generation: error str Error message describing what went wrong Methods WebSocket Implementation The WebSocket implementation ( NeuphonicTTSService ) inherits from InterruptibleTTSService and provides: Support for interrupting ongoing TTS generation Automatic websocket connection management Keep-alive mechanism for persistent connections Special handling for conversation flows HTTP Implementation The HTTP implementation ( NeuphonicHttpTTSService ) inherits from TTSService and provides: Simpler API integration using HTTP streaming Less overhead for single TTS requests Simplified error handling Language Support Neuphonic TTS supports the following languages: Language Code Description Service Codes Language.EN English en Language.ES Spanish es Language.DE German de Language.NL Dutch nl Language.AR Arabic ar Language.FR French fr Language.PT Portuguese pt Language.RU Russian ru Language.HI Hindi hi Language.ZH Chinese zh Regional variants (e.g., EN_US , ES_ES ) are automatically mapped to their base language. Usage Example WebSocket Implementation Copy Ask AI from pipecat.services.neuphonic.tts import NeuphonicTTSService from pipecat.transcriptions.language import Language # Configure service tts = NeuphonicTTSService( api_key = "your-neuphonic-api-key" , voice_id = "preferred-voice-id" , params = NeuphonicTTSService.InputParams( language = Language. EN , speed = 1.2 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) HTTP Implementation Copy Ask AI from pipecat.services.neuphonic.tts import NeuphonicHttpTTSService from pipecat.transcriptions.language import Language # Configure service tts = NeuphonicHttpTTSService( api_key = "your-neuphonic-api-key" , voice_id = "preferred-voice-id" , params = NeuphonicHttpTTSService.InputParams( language = Language. ES , speed = 1.0 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) Metrics Support Both services support metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration Audio Processing Configurable sample rate (defaults to 22050Hz) PCM linear encoding Single channel (mono) output Base64 decoding for audio data Notes WebSocket implementation includes a keep-alive mechanism (10-second interval) WebSocket service maintains a persistent connection for faster responses Both services automatically select appropriate language codes The WebSocket implementation pauses frame processing during speech generation to prevent overlapping responses MiniMax NVIDIA Riva On this page Overview Installation NeuphonicTTSService (WebSocket) Configuration InputParams NeuphonicHttpTTSService (HTTP) Configuration Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame Methods WebSocket Implementation HTTP Implementation Language Support Usage Example WebSocket Implementation HTTP Implementation Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_playht_24fee44a.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/playht#metrics-support
|
2 |
+
Title: PlayHT - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PlayHT - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech PlayHT Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview PlayHT provides two TTS service implementations: PlayHTTTSService : WebSocket-based service with real-time streaming PlayHTHttpTTSService : HTTP-based service for simpler, non-streaming synthesis Installation To use PlayHT services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[playht]" You’ll also need to set up your PlayHT credentials as environment variables: PLAY_HT_USER_ID PLAY_HT_API_KEY PlayHTTTSService WebSocket-based implementation supporting real-time streaming synthesis. Constructor Parameters api_key str required PlayHT API key user_id str required PlayHT user ID voice_url str required Voice identifier URL voice_engine str default: "PlayHT3.0-mini" TTS engine identifier. See the PlayHT docs for available engines. sample_rate int default: "None" Output audio sample rate in Hz output_format str default: "wav" Audio output format text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. Input Parameters Copy Ask AI class InputParams ( BaseModel ): language: Optional[Language] = Language. EN speed: Optional[ float ] = 1.0 seed: Optional[ int ] = None PlayHTHttpTTSService HTTP-based implementation for simpler synthesis requirements. Constructor Parameters api_key str required PlayHT API key user_id str required PlayHT user ID voice_url str required Voice identifier URL voice_engine str default: "Play3.0-mini-http" TTS engine identifier. The PlayHTHttpTTSService supports either Play3.0-mini-http or Play3.0-mini-ws . sample_rate int default: "None" Output audio sample rate in Hz Input Parameters Copy Ask AI class InputParams ( BaseModel ): language: Optional[Language] = Language. EN speed: Optional[ float ] = 1.0 seed: Optional[ int ] = None Output Frames Control Frames TTSStartedFrame Frame Signals start of synthesis TTSStoppedFrame Frame Signals completion of synthesis Audio Frames TTSAudioRawFrame Frame Contains generated audio data with: - WAV format - Specified sample rate - Single channel (mono) Error Frames ErrorFrame Frame Contains PlayHT error information Methods See the TTS base class methods for additional functionality. Language Support Supports multiple languages when using the PlayHT3.0-mini engine: Language Code Description Service Code Language.AF Afrikaans afrikans Language.AM Amharic amharic Language.AR Arabic arabic Language.BN Bengali bengali Language.BG Bulgarian bulgarian Language.CA Catalan catalan Language.CS Czech czech Language.DA Danish danish Language.DE German german Language.EL Greek greek Language.EN English english Language.ES Spanish spanish Language.FR French french Language.GL Galician galician Language.HE Hebrew hebrew Language.HI Hindi hindi Language.HR Croatian croatian Language.HU Hungarian hungarian Language.ID Indonesian indonesian Language.IT Italian italian Language.JA Japanese japanese Language.KO Korean korean Language.MS Malay malay Language.NL Dutch dutch Language.PL Polish polish Language.PT Portuguese portuguese Language.RU Russian russian Language.SQ Albanian albanian Language.SR Serbian serbian Language.SV Swedish swedish Language.TH Thai thai Language.TL Tagalog tagalog Language.TR Turkish turkish Language.UK Ukrainian ukrainian Language.UR Urdu urdu Language.XH Xhosa xhosa Language.ZH Mandarin mandarin See the PlayHT docs for a complete list of languages and options. Usage Examples WebSocket Service Copy Ask AI # Configure WebSocket service ws_service = PlayHTTTSService( api_key = "your-api-key" , user_id = "your-user-id" , voice_url = "voice-url" , voice_engine = "PlayHT3.0-mini" , params = PlayHTTTSService.InputParams( language = Language. EN , speed = 1.2 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) HTTP Service Copy Ask AI # Configure HTTP service http_service = PlayHTHttpTTSService( api_key = "your-api-key" , user_id = "your-user-id" , voice_url = "voice-url" , voice_engine = "PlayHT3.0-mini" , params = PlayHTHttpTTSService.InputParams( language = Language. EN , speed = 1.0 ) ) Frame Flow WebSocket Service HTTP Service Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls Notes WebSocket Service Real-time streaming support Automatic reconnection Interruption handling WAV header management Thread-safe processing HTTP Service Simpler implementation Complete audio delivery WAV header parsing Chunked audio delivery Lower latency for short texts Common Features Multiple voice engines Speed control Language support Seed-based consistency Error handling Metrics collection Piper Rime On this page Overview Installation PlayHTTTSService Constructor Parameters Input Parameters PlayHTHttpTTSService Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Examples WebSocket Service HTTP Service Frame Flow WebSocket Service HTTP Service Metrics Support Notes WebSocket Service HTTP Service Common Features Assistant Responses are generated using AI and may contain mistakes.
|
tts_sarvam_50614782.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/sarvam#configuration
|
2 |
+
Title: Sarvam AI - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Sarvam AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Sarvam AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SarvamTTSService converts text to speech using Sarvam AI’s TTS API. It specializes in Indian languages and provides extensive voice customization options including pitch, pace, and loudness control. Installation To use SarvamTTSService , no additional dependencies are required. You’ll also need to set up your Sarvam AI API key as an environment variable: SARVAM_API_KEY Configuration Constructor Parameters api_key str required Your Sarvam AI API subscription key voice_id str default: "anushka" Speaker voice identifier (e.g., “anushka”, “meera”, “abhilash”) model str default: "bulbul:v2" TTS model to use (“bulbul:v1” or “bulbul:v2”) aiohttp_session aiohttp.ClientSession required Shared aiohttp session for making HTTP requests base_url str default: "https://api.sarvam.ai" Sarvam AI API base URL sample_rate int default: "None" Audio sample rate in Hz (8000, 16000, 22050, 24000) params InputParams default: "None" Additional voice and preprocessing parameters InputParams Configuration language Language default: "Language.HI" Target language for synthesis pitch float default: "0.0" Voice pitch adjustment (-0.75 to 0.75) pace float default: "1.0" Speech speed (0.3 to 3.0) loudness float default: "1.0" Audio volume (0.1 to 3.0) enable_preprocessing bool default: "False" Enable text normalization for mixed-language content Input The service accepts text input through its TTS pipeline with automatic WAV header stripping for clean PCM output. Output Frames TTSStartedFrame Signals the start of audio generation. TTSAudioRawFrame Contains generated audio data: audio bytes Raw PCM audio data (WAV header stripped) sample_rate int Audio sample rate (22050Hz default) num_channels int Number of audio channels (1 for mono) TTSStoppedFrame Signals the completion of audio generation. Methods See the TTS base class methods for additional functionality. Language Support Sarvam AI TTS supports the following Indian languages: Language Code Description Service Code Language.BN Bengali bn-IN Language.EN English (India) en-IN Language.GU Gujarati gu-IN Language.HI Hindi hi-IN Language.KN Kannada kn-IN Language.ML Malayalam ml-IN Language.MR Marathi mr-IN Language.OR Odia od-IN Language.PA Punjabi pa-IN Language.TA Tamil ta-IN Language.TE Telugu te-IN Voice Models See the Sarvam docs for the latest information on available voices and models. Usage Example Copy Ask AI from pipecat.services.sarvam.tts import SarvamTTSService from pipecat.transcriptions.language import Language import aiohttp # Configure service async with aiohttp.ClientSession() as session: tts = SarvamTTSService( api_key = "your-api-key" , voice_id = "anushka" , model = "bulbul:v2" , aiohttp_session = session, params = SarvamTTSService.InputParams( language = Language. HI , ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) Frame Flow Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration Audio Processing Returns base64-encoded WAV audio from API Supports multiple sample rates (8000, 16000, 22050, 24000 Hz) Generates mono audio output Handles HTTP-based synthesis Notes Requires valid Sarvam AI API subscription key Specializes in Indian languages and voices Uses HTTP POST requests for synthesis Thread-safe HTTP session management required Rime XTTS On this page Overview Installation Configuration Constructor Parameters InputParams Configuration Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Voice Models Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
|
tts_sarvam_920aa54f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/tts/sarvam#param-sample-rate
|
2 |
+
Title: Sarvam AI - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Sarvam AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Sarvam AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SarvamTTSService converts text to speech using Sarvam AI’s TTS API. It specializes in Indian languages and provides extensive voice customization options including pitch, pace, and loudness control. Installation To use SarvamTTSService , no additional dependencies are required. You’ll also need to set up your Sarvam AI API key as an environment variable: SARVAM_API_KEY Configuration Constructor Parameters api_key str required Your Sarvam AI API subscription key voice_id str default: "anushka" Speaker voice identifier (e.g., “anushka”, “meera”, “abhilash”) model str default: "bulbul:v2" TTS model to use (“bulbul:v1” or “bulbul:v2”) aiohttp_session aiohttp.ClientSession required Shared aiohttp session for making HTTP requests base_url str default: "https://api.sarvam.ai" Sarvam AI API base URL sample_rate int default: "None" Audio sample rate in Hz (8000, 16000, 22050, 24000) params InputParams default: "None" Additional voice and preprocessing parameters InputParams Configuration language Language default: "Language.HI" Target language for synthesis pitch float default: "0.0" Voice pitch adjustment (-0.75 to 0.75) pace float default: "1.0" Speech speed (0.3 to 3.0) loudness float default: "1.0" Audio volume (0.1 to 3.0) enable_preprocessing bool default: "False" Enable text normalization for mixed-language content Input The service accepts text input through its TTS pipeline with automatic WAV header stripping for clean PCM output. Output Frames TTSStartedFrame Signals the start of audio generation. TTSAudioRawFrame Contains generated audio data: audio bytes Raw PCM audio data (WAV header stripped) sample_rate int Audio sample rate (22050Hz default) num_channels int Number of audio channels (1 for mono) TTSStoppedFrame Signals the completion of audio generation. Methods See the TTS base class methods for additional functionality. Language Support Sarvam AI TTS supports the following Indian languages: Language Code Description Service Code Language.BN Bengali bn-IN Language.EN English (India) en-IN Language.GU Gujarati gu-IN Language.HI Hindi hi-IN Language.KN Kannada kn-IN Language.ML Malayalam ml-IN Language.MR Marathi mr-IN Language.OR Odia od-IN Language.PA Punjabi pa-IN Language.TA Tamil ta-IN Language.TE Telugu te-IN Voice Models See the Sarvam docs for the latest information on available voices and models. Usage Example Copy Ask AI from pipecat.services.sarvam.tts import SarvamTTSService from pipecat.transcriptions.language import Language import aiohttp # Configure service async with aiohttp.ClientSession() as session: tts = SarvamTTSService( api_key = "your-api-key" , voice_id = "anushka" , model = "bulbul:v2" , aiohttp_session = session, params = SarvamTTSService.InputParams( language = Language. HI , ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) Frame Flow Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration Audio Processing Returns base64-encoded WAV audio from API Supports multiple sample rates (8000, 16000, 22050, 24000 Hz) Generates mono audio output Handles HTTP-based synthesis Notes Requires valid Sarvam AI API subscription key Specializes in Indian languages and voices Uses HTTP POST requests for synthesis Thread-safe HTTP session management required Rime XTTS On this page Overview Installation Configuration Constructor Parameters InputParams Configuration Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Voice Models Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
|
utilities_opentelemetry_1f6781e7.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/opentelemetry#conversation-spans
|
2 |
+
Title: OpenTelemetry Tracing - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
OpenTelemetry Tracing - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Metrics and Telemetry OpenTelemetry Tracing Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry OpenTelemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview Pipecat includes built-in support for OpenTelemetry tracing, allowing you to gain deep visibility into your voice applications. Tracing helps you: Track latency and performance across your conversation pipeline Monitor service health and identify bottlenecks Visualize conversation turns and service dependencies Collect usage metrics and operational analytics Installation To use OpenTelemetry tracing with Pipecat, install the tracing dependencies: Copy Ask AI pip install "pipecat-ai[tracing]" For local development and testing, we recommend using Jaeger as a trace collector. You can run it with Docker: Copy Ask AI docker run -d --name jaeger \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latest Then access the UI at http://localhost:16686 Basic Setup Enabling tracing in your Pipecat application requires two steps: Initialize the OpenTelemetry SDK with your preferred exporter Enable tracing in your PipelineTask Copy Ask AI import os from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.utils.tracing.setup import setup_tracing from pipecat.pipeline.task import PipelineTask, PipelineParams # Step 1: Initialize OpenTelemetry with your chosen exporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Jaeger or other collector endpoint insecure = True , ) setup_tracing( service_name = "my-voice-app" , exporter = exporter, console_export = False , # Set to True for debug output ) # Step 2: Enable tracing in your PipelineTask task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , # Required for some service metrics ), enable_tracing = True , # Enable tracing for this task enable_turn_tracking = True , # Enable turn tracking for this task conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) For complete working examples, see our sample implementations: Jaeger Tracing Example - Uses gRPC exporter with Jaeger Langfuse Tracing Example - Uses HTTP exporter with Langfuse for LLM-focused observability Trace Structure Pipecat organizes traces hierarchically, following the natural structure of conversations: Copy Ask AI Conversation (conversation) ├── turn │ ├── stt │ ├── llm │ └── tts └── turn ├── stt ├── llm └── tts turn... For real-time multimodal services like Gemini Live and OpenAI Realtime, the structure adapts to their specific patterns: Copy Ask AI Conversation (conversation) ├── turn │ ├── llm_setup (session configuration) │ ├── stt (user input) │ ├── llm_response (complete response with usage) │ └── llm_tool_call/llm_tool_result (for function calls) └── turn ├── stt (user input) └── llm_response (complete response) turn... This hierarchical structure makes it easy to: Track the full lifecycle of a conversation Measure latency for individual turns Identify which services are contributing to delays Compare performance across different conversations Exporter Options Pipecat supports any OpenTelemetry-compatible exporter. Common options include: OTLP Exporter (for Jaeger, Grafana, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Your collector endpoint insecure = True , # Use False for TLS connections ) HTTP OTLP Exporter (for Langfuse, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( # Configure with environment variables: # OTEL_EXPORTER_OTLP_ENDPOINT # OTEL_EXPORTER_OTLP_HEADERS ) See our Langfuse example for details on configuring this exporter. Console Exporter (for debugging) The console exporter can be enabled alongside any other exporter by setting console_export=True : Copy Ask AI setup_tracing( service_name = "my-voice-app" , exporter = otlp_exporter, console_export = True , # Prints traces to stdout ) Cloud Provider Exporters Many cloud providers offer OpenTelemetry-compatible observability services: AWS X-Ray Google Cloud Trace Azure Monitor Datadog APM Check the OpenTelemetry documentation for specific exporter configurations: OpenTelemetry Vendors Span Attributes Pipecat enriches spans with detailed attributes about service operations: TTS Service Spans gen_ai.system : Service provider (e.g., “cartesia”) gen_ai.request.model : Model ID/name voice_id : Voice identifier text : The text being synthesized metrics.character_count : Number of characters in the text metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters STT Service Spans gen_ai.system : Service provider (e.g., “deepgram”) gen_ai.request.model : Model ID/name transcript : The transcribed text is_final : Whether the transcription is final language : Detected or configured language vad_enabled : Whether voice activity detection is enabled metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters LLM Service Spans gen_ai.system : Service provider (e.g., “openai”, “gcp.gemini”) gen_ai.request.model : Model ID/name gen_ai.operation.name : Operation type (e.g., “chat”) stream : Whether streaming is enabled input : JSON-serialized input messages output : Complete response text tools : JSON-serialized tools configuration tools.count : Number of tools available tools.names : Comma-separated tool names system : System message content gen_ai.usage.input_tokens : Number of prompt tokens gen_ai.usage.output_tokens : Number of completion tokens metrics.ttfb : Time to first byte in seconds gen_ai.request.* : Standard parameters (temperature, max_tokens, etc.) Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans gen_ai.system : “gcp.gemini” or “openai” gen_ai.request.model : Model identifier tools.count : Number of available tools tools.definitions : JSON-serialized tool schemas system_instruction : System prompt (truncated) session.* : Session configuration parameters Request Spans (OpenAI Realtime) input : JSON-serialized context messages being sent gen_ai.operation.name : “llm_request” Response Spans output : Complete assistant response text output_modality : “TEXT” or “AUDIO” (Gemini Live) gen_ai.usage.input_tokens : Prompt tokens used gen_ai.usage.output_tokens : Completion tokens generated function_calls.count : Number of function calls made function_calls.names : Comma-separated function names metrics.ttfb : Time to first response in seconds Tool Call/Result Spans (Gemini Live) tool.function_name : Name of the function being called tool.call_id : Unique identifier for the call tool.arguments : Function arguments (truncated) tool.result : Function execution result (truncated) tool.result_status : “completed”, “error”, or “parse_error” Turn Spans turn.number : Sequential turn number turn.type : Type of turn (e.g., “conversation”) turn.duration_seconds : Duration of the turn turn.was_interrupted : Whether the turn was interrupted conversation.id : ID of the parent conversation Conversation Spans conversation.id : Unique identifier for the conversation conversation.type : Type of conversation (e.g., “voice”) Usage Metrics Pipecat’s tracing implementation automatically captures usage metrics for LLM and TTS services: LLM Token Usage Token usage is captured in LLM spans as: gen_ai.usage.input_tokens gen_ai.usage.output_tokens TTS Character Count Character count is captured in TTS spans as: metrics.character_count Performance Metrics Pipecat traces capture key performance metrics for each service: Time To First Byte (TTFB) The time it takes for a service to produce its first response: metrics.ttfb (in seconds) Processing Duration The total time spent processing in each service is captured in the span duration. Configuration Options PipelineTask Parameters enable_tracing bool default: "True" Enable or disable tracing for the pipeline enable_turn_tracking bool default: "False" Whether to enable turn tracking. conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. setup_tracing() Parameters service_name str default: "pipecat" Name of the service for traces exporter Optional[SpanExporter] default: "None" A pre-configured OpenTelemetry span exporter instance console_export bool default: "False" Whether to also export traces to console (useful for debugging) Example Here’s a complete example showing OpenTelemetry tracing setup with Jaeger: Copy Ask AI import os from dotenv import load_dotenv from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.utils.tracing.setup import setup_tracing load_dotenv() # Initialize tracing if enabled if os.getenv( "ENABLE_TRACING" ): # Create the exporter otlp_exporter = OTLPSpanExporter( endpoint = os.getenv( "OTEL_EXPORTER_OTLP_ENDPOINT" , "http://localhost:4317" ), insecure = True , ) # Set up tracing with the exporter setup_tracing( service_name = "pipecat-demo" , exporter = otlp_exporter, console_export = bool (os.getenv( "OTEL_CONSOLE_EXPORT" )), ) # Create your services stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" )) tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your-voice-id" ) # Build pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) # Create pipeline task with tracing enabled task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), enable_tracing = True , enable_turn_tracking = True , conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) # Run the pipeline runner = PipelineRunner() await runner.run(task) Troubleshooting If you’re having issues with tracing: No Traces Visible : Ensure the OpenTelemetry packages are installed and that your collector endpoint is correct Missing Service Data : Verify that enable_metrics=True is set in PipelineParams Debugging Tracing : Enable console export with console_export=True to view traces in your logs Connection Errors : Check network connectivity to your trace collector Collector Configuration : Verify your collector is properly set up to receive traces References OpenTelemetry Python Documentation OpenTelemetry Tracing Specification Jaeger Documentation Langfuse OpenTelemetry Documentation WakeNotifierFilter MCPClient On this page Overview Installation Basic Setup Trace Structure Exporter Options OTLP Exporter (for Jaeger, Grafana, etc.) HTTP OTLP Exporter (for Langfuse, etc.) Console Exporter (for debugging) Cloud Provider Exporters Span Attributes TTS Service Spans STT Service Spans LLM Service Spans Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans Request Spans (OpenAI Realtime) Response Spans Tool Call/Result Spans (Gemini Live) Turn Spans Conversation Spans Usage Metrics LLM Token Usage TTS Character Count Performance Metrics Time To First Byte (TTFB) Processing Duration Configuration Options PipelineTask Parameters setup_tracing() Parameters Example Troubleshooting References Assistant Responses are generated using AI and may contain mistakes.
|
utilities_opentelemetry_bae42f8b.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/opentelemetry#otlp-exporter-for-jaeger%2C-grafana%2C-etc
|
2 |
+
Title: OpenTelemetry Tracing - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
OpenTelemetry Tracing - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Metrics and Telemetry OpenTelemetry Tracing Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry OpenTelemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview Pipecat includes built-in support for OpenTelemetry tracing, allowing you to gain deep visibility into your voice applications. Tracing helps you: Track latency and performance across your conversation pipeline Monitor service health and identify bottlenecks Visualize conversation turns and service dependencies Collect usage metrics and operational analytics Installation To use OpenTelemetry tracing with Pipecat, install the tracing dependencies: Copy Ask AI pip install "pipecat-ai[tracing]" For local development and testing, we recommend using Jaeger as a trace collector. You can run it with Docker: Copy Ask AI docker run -d --name jaeger \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latest Then access the UI at http://localhost:16686 Basic Setup Enabling tracing in your Pipecat application requires two steps: Initialize the OpenTelemetry SDK with your preferred exporter Enable tracing in your PipelineTask Copy Ask AI import os from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.utils.tracing.setup import setup_tracing from pipecat.pipeline.task import PipelineTask, PipelineParams # Step 1: Initialize OpenTelemetry with your chosen exporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Jaeger or other collector endpoint insecure = True , ) setup_tracing( service_name = "my-voice-app" , exporter = exporter, console_export = False , # Set to True for debug output ) # Step 2: Enable tracing in your PipelineTask task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , # Required for some service metrics ), enable_tracing = True , # Enable tracing for this task enable_turn_tracking = True , # Enable turn tracking for this task conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) For complete working examples, see our sample implementations: Jaeger Tracing Example - Uses gRPC exporter with Jaeger Langfuse Tracing Example - Uses HTTP exporter with Langfuse for LLM-focused observability Trace Structure Pipecat organizes traces hierarchically, following the natural structure of conversations: Copy Ask AI Conversation (conversation) ├── turn │ ├── stt │ ├── llm │ └── tts └── turn ├── stt ├── llm └── tts turn... For real-time multimodal services like Gemini Live and OpenAI Realtime, the structure adapts to their specific patterns: Copy Ask AI Conversation (conversation) ├── turn │ ├── llm_setup (session configuration) │ ├── stt (user input) │ ├── llm_response (complete response with usage) │ └── llm_tool_call/llm_tool_result (for function calls) └── turn ├── stt (user input) └── llm_response (complete response) turn... This hierarchical structure makes it easy to: Track the full lifecycle of a conversation Measure latency for individual turns Identify which services are contributing to delays Compare performance across different conversations Exporter Options Pipecat supports any OpenTelemetry-compatible exporter. Common options include: OTLP Exporter (for Jaeger, Grafana, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Your collector endpoint insecure = True , # Use False for TLS connections ) HTTP OTLP Exporter (for Langfuse, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( # Configure with environment variables: # OTEL_EXPORTER_OTLP_ENDPOINT # OTEL_EXPORTER_OTLP_HEADERS ) See our Langfuse example for details on configuring this exporter. Console Exporter (for debugging) The console exporter can be enabled alongside any other exporter by setting console_export=True : Copy Ask AI setup_tracing( service_name = "my-voice-app" , exporter = otlp_exporter, console_export = True , # Prints traces to stdout ) Cloud Provider Exporters Many cloud providers offer OpenTelemetry-compatible observability services: AWS X-Ray Google Cloud Trace Azure Monitor Datadog APM Check the OpenTelemetry documentation for specific exporter configurations: OpenTelemetry Vendors Span Attributes Pipecat enriches spans with detailed attributes about service operations: TTS Service Spans gen_ai.system : Service provider (e.g., “cartesia”) gen_ai.request.model : Model ID/name voice_id : Voice identifier text : The text being synthesized metrics.character_count : Number of characters in the text metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters STT Service Spans gen_ai.system : Service provider (e.g., “deepgram”) gen_ai.request.model : Model ID/name transcript : The transcribed text is_final : Whether the transcription is final language : Detected or configured language vad_enabled : Whether voice activity detection is enabled metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters LLM Service Spans gen_ai.system : Service provider (e.g., “openai”, “gcp.gemini”) gen_ai.request.model : Model ID/name gen_ai.operation.name : Operation type (e.g., “chat”) stream : Whether streaming is enabled input : JSON-serialized input messages output : Complete response text tools : JSON-serialized tools configuration tools.count : Number of tools available tools.names : Comma-separated tool names system : System message content gen_ai.usage.input_tokens : Number of prompt tokens gen_ai.usage.output_tokens : Number of completion tokens metrics.ttfb : Time to first byte in seconds gen_ai.request.* : Standard parameters (temperature, max_tokens, etc.) Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans gen_ai.system : “gcp.gemini” or “openai” gen_ai.request.model : Model identifier tools.count : Number of available tools tools.definitions : JSON-serialized tool schemas system_instruction : System prompt (truncated) session.* : Session configuration parameters Request Spans (OpenAI Realtime) input : JSON-serialized context messages being sent gen_ai.operation.name : “llm_request” Response Spans output : Complete assistant response text output_modality : “TEXT” or “AUDIO” (Gemini Live) gen_ai.usage.input_tokens : Prompt tokens used gen_ai.usage.output_tokens : Completion tokens generated function_calls.count : Number of function calls made function_calls.names : Comma-separated function names metrics.ttfb : Time to first response in seconds Tool Call/Result Spans (Gemini Live) tool.function_name : Name of the function being called tool.call_id : Unique identifier for the call tool.arguments : Function arguments (truncated) tool.result : Function execution result (truncated) tool.result_status : “completed”, “error”, or “parse_error” Turn Spans turn.number : Sequential turn number turn.type : Type of turn (e.g., “conversation”) turn.duration_seconds : Duration of the turn turn.was_interrupted : Whether the turn was interrupted conversation.id : ID of the parent conversation Conversation Spans conversation.id : Unique identifier for the conversation conversation.type : Type of conversation (e.g., “voice”) Usage Metrics Pipecat’s tracing implementation automatically captures usage metrics for LLM and TTS services: LLM Token Usage Token usage is captured in LLM spans as: gen_ai.usage.input_tokens gen_ai.usage.output_tokens TTS Character Count Character count is captured in TTS spans as: metrics.character_count Performance Metrics Pipecat traces capture key performance metrics for each service: Time To First Byte (TTFB) The time it takes for a service to produce its first response: metrics.ttfb (in seconds) Processing Duration The total time spent processing in each service is captured in the span duration. Configuration Options PipelineTask Parameters enable_tracing bool default: "True" Enable or disable tracing for the pipeline enable_turn_tracking bool default: "False" Whether to enable turn tracking. conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. setup_tracing() Parameters service_name str default: "pipecat" Name of the service for traces exporter Optional[SpanExporter] default: "None" A pre-configured OpenTelemetry span exporter instance console_export bool default: "False" Whether to also export traces to console (useful for debugging) Example Here’s a complete example showing OpenTelemetry tracing setup with Jaeger: Copy Ask AI import os from dotenv import load_dotenv from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.utils.tracing.setup import setup_tracing load_dotenv() # Initialize tracing if enabled if os.getenv( "ENABLE_TRACING" ): # Create the exporter otlp_exporter = OTLPSpanExporter( endpoint = os.getenv( "OTEL_EXPORTER_OTLP_ENDPOINT" , "http://localhost:4317" ), insecure = True , ) # Set up tracing with the exporter setup_tracing( service_name = "pipecat-demo" , exporter = otlp_exporter, console_export = bool (os.getenv( "OTEL_CONSOLE_EXPORT" )), ) # Create your services stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" )) tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your-voice-id" ) # Build pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) # Create pipeline task with tracing enabled task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), enable_tracing = True , enable_turn_tracking = True , conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) # Run the pipeline runner = PipelineRunner() await runner.run(task) Troubleshooting If you’re having issues with tracing: No Traces Visible : Ensure the OpenTelemetry packages are installed and that your collector endpoint is correct Missing Service Data : Verify that enable_metrics=True is set in PipelineParams Debugging Tracing : Enable console export with console_export=True to view traces in your logs Connection Errors : Check network connectivity to your trace collector Collector Configuration : Verify your collector is properly set up to receive traces References OpenTelemetry Python Documentation OpenTelemetry Tracing Specification Jaeger Documentation Langfuse OpenTelemetry Documentation WakeNotifierFilter MCPClient On this page Overview Installation Basic Setup Trace Structure Exporter Options OTLP Exporter (for Jaeger, Grafana, etc.) HTTP OTLP Exporter (for Langfuse, etc.) Console Exporter (for debugging) Cloud Provider Exporters Span Attributes TTS Service Spans STT Service Spans LLM Service Spans Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans Request Spans (OpenAI Realtime) Response Spans Tool Call/Result Spans (Gemini Live) Turn Spans Conversation Spans Usage Metrics LLM Token Usage TTS Character Count Performance Metrics Time To First Byte (TTFB) Processing Duration Configuration Options PipelineTask Parameters setup_tracing() Parameters Example Troubleshooting References Assistant Responses are generated using AI and may contain mistakes.
|
utilities_opentelemetry_ed334555.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/opentelemetry
|
2 |
+
Title: OpenTelemetry Tracing - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
OpenTelemetry Tracing - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Metrics and Telemetry OpenTelemetry Tracing Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry OpenTelemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview Pipecat includes built-in support for OpenTelemetry tracing, allowing you to gain deep visibility into your voice applications. Tracing helps you: Track latency and performance across your conversation pipeline Monitor service health and identify bottlenecks Visualize conversation turns and service dependencies Collect usage metrics and operational analytics Installation To use OpenTelemetry tracing with Pipecat, install the tracing dependencies: Copy Ask AI pip install "pipecat-ai[tracing]" For local development and testing, we recommend using Jaeger as a trace collector. You can run it with Docker: Copy Ask AI docker run -d --name jaeger \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latest Then access the UI at http://localhost:16686 Basic Setup Enabling tracing in your Pipecat application requires two steps: Initialize the OpenTelemetry SDK with your preferred exporter Enable tracing in your PipelineTask Copy Ask AI import os from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.utils.tracing.setup import setup_tracing from pipecat.pipeline.task import PipelineTask, PipelineParams # Step 1: Initialize OpenTelemetry with your chosen exporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Jaeger or other collector endpoint insecure = True , ) setup_tracing( service_name = "my-voice-app" , exporter = exporter, console_export = False , # Set to True for debug output ) # Step 2: Enable tracing in your PipelineTask task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , # Required for some service metrics ), enable_tracing = True , # Enable tracing for this task enable_turn_tracking = True , # Enable turn tracking for this task conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) For complete working examples, see our sample implementations: Jaeger Tracing Example - Uses gRPC exporter with Jaeger Langfuse Tracing Example - Uses HTTP exporter with Langfuse for LLM-focused observability Trace Structure Pipecat organizes traces hierarchically, following the natural structure of conversations: Copy Ask AI Conversation (conversation) ├── turn │ ├── stt │ ├── llm │ └── tts └── turn ├── stt ├── llm └── tts turn... For real-time multimodal services like Gemini Live and OpenAI Realtime, the structure adapts to their specific patterns: Copy Ask AI Conversation (conversation) ├── turn │ ├── llm_setup (session configuration) │ ├── stt (user input) │ ├── llm_response (complete response with usage) │ └── llm_tool_call/llm_tool_result (for function calls) └── turn ├── stt (user input) └── llm_response (complete response) turn... This hierarchical structure makes it easy to: Track the full lifecycle of a conversation Measure latency for individual turns Identify which services are contributing to delays Compare performance across different conversations Exporter Options Pipecat supports any OpenTelemetry-compatible exporter. Common options include: OTLP Exporter (for Jaeger, Grafana, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Your collector endpoint insecure = True , # Use False for TLS connections ) HTTP OTLP Exporter (for Langfuse, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( # Configure with environment variables: # OTEL_EXPORTER_OTLP_ENDPOINT # OTEL_EXPORTER_OTLP_HEADERS ) See our Langfuse example for details on configuring this exporter. Console Exporter (for debugging) The console exporter can be enabled alongside any other exporter by setting console_export=True : Copy Ask AI setup_tracing( service_name = "my-voice-app" , exporter = otlp_exporter, console_export = True , # Prints traces to stdout ) Cloud Provider Exporters Many cloud providers offer OpenTelemetry-compatible observability services: AWS X-Ray Google Cloud Trace Azure Monitor Datadog APM Check the OpenTelemetry documentation for specific exporter configurations: OpenTelemetry Vendors Span Attributes Pipecat enriches spans with detailed attributes about service operations: TTS Service Spans gen_ai.system : Service provider (e.g., “cartesia”) gen_ai.request.model : Model ID/name voice_id : Voice identifier text : The text being synthesized metrics.character_count : Number of characters in the text metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters STT Service Spans gen_ai.system : Service provider (e.g., “deepgram”) gen_ai.request.model : Model ID/name transcript : The transcribed text is_final : Whether the transcription is final language : Detected or configured language vad_enabled : Whether voice activity detection is enabled metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters LLM Service Spans gen_ai.system : Service provider (e.g., “openai”, “gcp.gemini”) gen_ai.request.model : Model ID/name gen_ai.operation.name : Operation type (e.g., “chat”) stream : Whether streaming is enabled input : JSON-serialized input messages output : Complete response text tools : JSON-serialized tools configuration tools.count : Number of tools available tools.names : Comma-separated tool names system : System message content gen_ai.usage.input_tokens : Number of prompt tokens gen_ai.usage.output_tokens : Number of completion tokens metrics.ttfb : Time to first byte in seconds gen_ai.request.* : Standard parameters (temperature, max_tokens, etc.) Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans gen_ai.system : “gcp.gemini” or “openai” gen_ai.request.model : Model identifier tools.count : Number of available tools tools.definitions : JSON-serialized tool schemas system_instruction : System prompt (truncated) session.* : Session configuration parameters Request Spans (OpenAI Realtime) input : JSON-serialized context messages being sent gen_ai.operation.name : “llm_request” Response Spans output : Complete assistant response text output_modality : “TEXT” or “AUDIO” (Gemini Live) gen_ai.usage.input_tokens : Prompt tokens used gen_ai.usage.output_tokens : Completion tokens generated function_calls.count : Number of function calls made function_calls.names : Comma-separated function names metrics.ttfb : Time to first response in seconds Tool Call/Result Spans (Gemini Live) tool.function_name : Name of the function being called tool.call_id : Unique identifier for the call tool.arguments : Function arguments (truncated) tool.result : Function execution result (truncated) tool.result_status : “completed”, “error”, or “parse_error” Turn Spans turn.number : Sequential turn number turn.type : Type of turn (e.g., “conversation”) turn.duration_seconds : Duration of the turn turn.was_interrupted : Whether the turn was interrupted conversation.id : ID of the parent conversation Conversation Spans conversation.id : Unique identifier for the conversation conversation.type : Type of conversation (e.g., “voice”) Usage Metrics Pipecat’s tracing implementation automatically captures usage metrics for LLM and TTS services: LLM Token Usage Token usage is captured in LLM spans as: gen_ai.usage.input_tokens gen_ai.usage.output_tokens TTS Character Count Character count is captured in TTS spans as: metrics.character_count Performance Metrics Pipecat traces capture key performance metrics for each service: Time To First Byte (TTFB) The time it takes for a service to produce its first response: metrics.ttfb (in seconds) Processing Duration The total time spent processing in each service is captured in the span duration. Configuration Options PipelineTask Parameters enable_tracing bool default: "True" Enable or disable tracing for the pipeline enable_turn_tracking bool default: "False" Whether to enable turn tracking. conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. setup_tracing() Parameters service_name str default: "pipecat" Name of the service for traces exporter Optional[SpanExporter] default: "None" A pre-configured OpenTelemetry span exporter instance console_export bool default: "False" Whether to also export traces to console (useful for debugging) Example Here’s a complete example showing OpenTelemetry tracing setup with Jaeger: Copy Ask AI import os from dotenv import load_dotenv from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.utils.tracing.setup import setup_tracing load_dotenv() # Initialize tracing if enabled if os.getenv( "ENABLE_TRACING" ): # Create the exporter otlp_exporter = OTLPSpanExporter( endpoint = os.getenv( "OTEL_EXPORTER_OTLP_ENDPOINT" , "http://localhost:4317" ), insecure = True , ) # Set up tracing with the exporter setup_tracing( service_name = "pipecat-demo" , exporter = otlp_exporter, console_export = bool (os.getenv( "OTEL_CONSOLE_EXPORT" )), ) # Create your services stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" )) tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your-voice-id" ) # Build pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) # Create pipeline task with tracing enabled task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), enable_tracing = True , enable_turn_tracking = True , conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) # Run the pipeline runner = PipelineRunner() await runner.run(task) Troubleshooting If you’re having issues with tracing: No Traces Visible : Ensure the OpenTelemetry packages are installed and that your collector endpoint is correct Missing Service Data : Verify that enable_metrics=True is set in PipelineParams Debugging Tracing : Enable console export with console_export=True to view traces in your logs Connection Errors : Check network connectivity to your trace collector Collector Configuration : Verify your collector is properly set up to receive traces References OpenTelemetry Python Documentation OpenTelemetry Tracing Specification Jaeger Documentation Langfuse OpenTelemetry Documentation WakeNotifierFilter MCPClient On this page Overview Installation Basic Setup Trace Structure Exporter Options OTLP Exporter (for Jaeger, Grafana, etc.) HTTP OTLP Exporter (for Langfuse, etc.) Console Exporter (for debugging) Cloud Provider Exporters Span Attributes TTS Service Spans STT Service Spans LLM Service Spans Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans Request Spans (OpenAI Realtime) Response Spans Tool Call/Result Spans (Gemini Live) Turn Spans Conversation Spans Usage Metrics LLM Token Usage TTS Character Count Performance Metrics Time To First Byte (TTFB) Processing Duration Configuration Options PipelineTask Parameters setup_tracing() Parameters Example Troubleshooting References Assistant Responses are generated using AI and may contain mistakes.
|
utilities_transcript-processor_72e73b27.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/transcript-processor#assistanttranscriptprocessor
|
2 |
+
Title: TranscriptProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
TranscriptProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation User and Bot Transcriptions TranscriptProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions TranscriptProcessor User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview The TranscriptProcessor is a factory class that creates and manages processors for handling conversation transcripts from both users and assistants. It provides unified access to transcript processors with shared event handling, making it easy to track and respond to conversation updates in real-time. The processor normalizes messages from various sources into a consistent TranscriptionMessage format and emits events when new messages are added to the conversation. Constructor Copy Ask AI TranscriptProcessor() Creates a new transcript processor factory with no parameters. Methods user() Copy Ask AI def user ( ** kwargs ) -> UserTranscriptProcessor Get or create the user transcript processor instance. This processor handles TranscriptionFrame s from STT services. Parameters: **kwargs : Arguments passed to the UserTranscriptProcessor constructor Returns: UserTranscriptProcessor instance for processing user messages. assistant() Copy Ask AI def assistant ( ** kwargs ) -> AssistantTranscriptProcessor Get or create the assistant transcript processor instance. This processor handles TTSTextFrame s from TTS services and aggregates them into complete utterances. Parameters: **kwargs : Arguments passed to the AssistantTranscriptProcessor constructor Returns: AssistantTranscriptProcessor instance for processing assistant messages. event_handler() Copy Ask AI def event_handler ( event_name : str ) Decorator that registers event handlers for both user and assistant processors. Parameters: event_name : Name of the event to handle Returns: Decorator function that registers the handler with both processors. Event Handlers on_transcript_update Triggered when new messages are added to the conversation transcript. Copy Ask AI @transcript.event_handler ( "on_transcript_update" ) async def handle_transcript_update ( processor , frame ): # Handle transcript updates pass Parameters: processor : The specific processor instance that emitted the event (UserTranscriptProcessor or AssistantTranscriptProcessor) frame : TranscriptionUpdateFrame containing the new messages Data Structures TranscriptionMessage Copy Ask AI @dataclass class TranscriptionMessage : role: Literal[ "user" , "assistant" ] content: str timestamp: str | None = None user_id: str | None = None Fields: role : The message sender type (“user” or “assistant”) content : The transcribed text content timestamp : ISO 8601 timestamp when the message was created user_id : Optional user identifier (for user messages only) TranscriptionUpdateFrame Frame containing new transcript messages, emitted by the on_transcript_update event. Properties: messages : List of TranscriptionMessage objects containing the new transcript content Frames UserTranscriptProcessor Input: TranscriptionFrame from STT services Output: TranscriptionMessage with role “user” AssistantTranscriptProcessor Input: TTSTextFrame from TTS services Output: TranscriptionMessage with role “assistant” Integration Notes Pipeline Placement Place the processors at specific positions in your pipeline for accurate transcript collection: Copy Ask AI pipeline = Pipeline([ transport.input(), stt, # Speech-to-text service transcript.user(), # Place after STT context_aggregator.user(), llm, tts, # Text-to-speech service transport.output(), transcript.assistant(), # Place after transport.output() context_aggregator.assistant(), ]) Event Handler Registration Event handlers are automatically applied to both user and assistant processors: Copy Ask AI transcript = TranscriptProcessor() # This handler will receive events from both processors @transcript.event_handler ( "on_transcript_update" ) async def handle_update ( processor , frame ): for message in frame.messages: print ( f " { message.role } : { message.content } " ) PatternPairAggregator Interruption Strategies On this page Overview Constructor Methods user() assistant() event_handler() Event Handlers on_transcript_update Data Structures TranscriptionMessage TranscriptionUpdateFrame Frames UserTranscriptProcessor AssistantTranscriptProcessor Integration Notes Pipeline Placement Event Handler Registration Assistant Responses are generated using AI and may contain mistakes.
|