Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes. Β
See raw diff
- _sources_indexrsttxt_03af5431.txt +5 -0
- audio_krisp-filter_619515d5.txt +5 -0
- audio_noisereduce-filter_d0ccd86d.txt +5 -0
- audio_silero-vad-analyzer_9565ffed.txt +5 -0
- base-classes_text_f6ec6245.txt +5 -0
- client_migration-guide_c3f58546.txt +5 -0
- client_rtvi-standard_d0dac012.txt +5 -0
- daily_rest-helpers_55865c61.txt +5 -0
- daily_rest-helpers_d35953ef.txt +5 -0
- daily_rest-helpers_e358e49b.txt +5 -0
- deployment_modal_14388797.txt +5 -0
- deployment_wwwflyio_c55ec17a.txt +5 -0
- features_gemini-multimodal-live_58404a9e.txt +5 -0
- features_openai-audio-models-and-apis_477b3ad5.txt +5 -0
- features_pipecat-flows_5892a5c1.txt +5 -0
- features_pipecat-flows_94c674f2.txt +5 -0
- filters_frame-filter_3cd853b4.txt +5 -0
- filters_function-filter_0433187d.txt +5 -0
- filters_wake-check-filter_be3b0bfa.txt +5 -0
- filters_wake-notifier-filter_deec0e70.txt +5 -0
- flows_pipecat-flows_60eeeb51.txt +5 -0
- flows_pipecat-flows_7d899c19.txt +5 -0
- frame_producer-consumer_0b939df3.txt +5 -0
- fundamentals_context-management_81019595.txt +5 -0
- fundamentals_function-calling_aa2d2f1c.txt +5 -0
- fundamentals_function-calling_aba12231.txt +5 -0
- fundamentals_recording-transcripts_b49334dd.txt +5 -0
- fundamentals_user-input-muting_85057656.txt +5 -0
- image-generation_fal_340084af.txt +5 -0
- image-generation_fal_4e43655d.txt +5 -0
- image-generation_openai_ba6382a9.txt +5 -0
- ios_introduction_2aa11c8e.txt +5 -0
- llm_gemini_6ea32a78.txt +5 -0
- llm_grok_01e8e47f.txt +5 -0
- llm_grok_fddf9bc0.txt +5 -0
- llm_ollama_015de6b2.txt +5 -0
- llm_openai_cfc7f2e1.txt +5 -0
- llm_perplexity_5fcd95f8.txt +5 -0
- observers_turn-tracking-observer_c6e5fbb9.txt +5 -0
- pipecat-transport-openai-realtime-webrtc_indexhtml_aa5b4452.txt +5 -0
- pipeline_pipeline-task_23275744.txt +5 -0
- pipeline_pipeline-task_3534b9ca.txt +5 -0
- react_components_0efbf3cd.txt +5 -0
- react_hooks_54f22d4c.txt +5 -0
- react_introduction_470e3f47.txt +5 -0
- s2s_aws_15f3d046.txt +5 -0
- s2s_gemini_13c30d1c.txt +5 -0
- s2s_gemini_3e665166.txt +5 -0
- s2s_gemini_4924f9a5.txt +5 -0
- s2s_gemini_98435b26.txt +5 -0
_sources_indexrsttxt_03af5431.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/links/_sources/index.rst.txt#next-steps
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
audio_krisp-filter_619515d5.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/api-reference/utilities/audio/krisp-filter
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
audio_noisereduce-filter_d0ccd86d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter#audio-flow
|
2 |
+
Title: NoisereduceFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
NoisereduceFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing NoisereduceFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview NoisereduceFilter is an audio processor that reduces background noise in real-time audio streams using the noisereduce library. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. β Installation The noisereduce filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[noisereduce]" β Constructor Parameters This filter has no configurable parameters in its constructor. β Input Frames β FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) β Usage Example Copy Ask AI from pipecat.audio.filters.noisereduce_filter import NoisereduceFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = NoisereduceFilter(), # Enable noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) β Audio Flow β Notes Lightweight alternative to Krisp for noise reduction Supports real-time audio processing Handles PCM_16 audio format Thread-safe for pipeline processing Can be dynamically enabled/disabled No additional configuration required Uses statistical noise reduction techniques KrispFilter SileroVADAnalyzer On this page Overview Installation Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_silero-vad-analyzer_9565ffed.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer#param-confidence
|
2 |
+
Title: SileroVADAnalyzer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SileroVADAnalyzer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing SileroVADAnalyzer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SileroVADAnalyzer is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime. β Installation The Silero VAD analyzer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[silero]" β Constructor Parameters β sample_rate int default: "None" Audio sample rate in Hz. Must be either 8000 or 16000. β params VADParams default: "VADParams()" Voice Activity Detection parameters object Show properties β confidence float default: "0.7" Confidence threshold for speech detection. Higher values make detection more strict. Must be between 0 and 1. β start_secs float default: "0.2" Time in seconds that speech must be detected before transitioning to SPEAKING state. β stop_secs float default: "0.8" Time in seconds of silence required before transitioning back to QUIET state. β min_volume float default: "0.6" Minimum audio volume threshold for speech detection. Must be between 0 and 1. β Usage Example Copy Ask AI transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) β Technical Details β Sample Rate Requirements The analyzer supports two sample rates: 8000 Hz (256 samples per frame) 16000 Hz (512 samples per frame) Model Management Uses ONNX runtime for efficient inference Automatically resets model state every 5 seconds to manage memory Runs on CPU by default for consistent performance Includes built-in model file β Notes High-accuracy speech detection Efficient ONNX-based processing Automatic memory management Thread-safe for pipeline processing Built-in model file included CPU-optimized inference Supports 8kHz and 16kHz audio NoisereduceFilter SoundfileMixer On this page Overview Installation Constructor Parameters Usage Example Technical Details Sample Rate Requirements Notes Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_text_f6ec6245.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/text#what-you-can-build
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
client_migration-guide_c3f58546.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/migration-guide#key-changes
|
2 |
+
Title: RTVIClient Migration Guide - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
RTVIClient Migration Guide - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation RTVIClient Migration Guide Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport This guide will cover the high-level changes between the old RTVIClient and the new PipecatClient . For specific code updates, refer to the platform-specific migration guides. β Key changes Client Name : The class name has changed from RTVIClient to PipecatClient . Pipeline Connection : Previously, the client expected a REST endpoint for gathering connection information as part of the constructor and was difficult to update or bipass. The new client expects connection information to be provided directly as part of the connect() method and can either be provided as an object with details your Transport requires or as an object with REST endpoint details for acquiring them. Actions and helpers : These have gone away in favor of some built-in methods for doing common actions like function call handling and appending to the llm context or in the case of custom actions, a simple set of methods for sending messages to the bot and handling responses. See registerFunctionCallHandler() , appendToContext() , sendClientMessage() , and sendClientRequest() for more details. Bot Configuration : This functionality as been removed as a security measure, so that a client cannot inherently have the ability to override a bot configuration and use credentials to its own whims. If you need the client to initialize or update the bot configuration, you will need to do so through an API call to your backend or building on top of the client-server messaging, which has now been made easier. The Client SDKs are currently in the process of making these changes. At this time, only the JavaScript and React libraries have been updated and released. Their corresponding documentation along with this top-level documentation has been updated to reflect the latest changes. The React Native, iOS, and Android SDKs are still in the process of being updated and their documentation will be updated and a migration guide provided once the new versions are released. If you have any questions or need assistance, please reach out to us on Discord . β Migration guides JavaScript Migrate your JavaScript client code to the new PipecatClient React Update your React components to use the new PipecatClient The RTVI Standard SDK Introduction On this page Key changes Migration guides Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_d0dac012.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#terms
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. β Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. β Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. β RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } β id string A unique identifier for the message, used to correlate requests and responses. β label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . β type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. β data unknown The payload of the message, which can be any data structure relevant to the message type. β RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a π€ or π emoji to denote whether the message is sent from the bot (π€) or client (π). β Connection Management β client-ready π Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient β library string required β library_version string β platform string β platform_version string β platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. β bot-ready π€ Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. Itβs structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. β disconnect-bot π Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined β error π€ Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. β Transcription β user-started-speaking π€ Emitted when the user begins speaking type : 'user-started-speaking' data : None β user-stopped-speaking π€ Emitted when the user stops speaking type : 'user-stopped-speaking' data : None β bot-started-speaking π€ Emitted when the bot begins speaking type : 'bot-started-speaking' data : None β bot-stopped-speaking π€ Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None β user-transcription π€ Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. β bot-transcription π€ Transcription of the botβs speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the botβs response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your botβs speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. β Client-Server Messaging β server-message π€ An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. β client-message π An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β server-response π€ An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β error-response π€ Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string β Advanced LLM Interactions β append-to-context π A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. β llm-function-call π€ A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. β llm-function-call-result π The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. β bot-llm-search-response π€ Search results from the LLMβs knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } β Service-Specific Insights β bot-llm-started π€ Indicates LLM processing has begun type : bot-llm-started data : None β bot-llm-stopped π€ Indicates LLM processing has completed type : bot-llm-stopped data : None β user-llm-text π€ Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The userβs input text to be processed by the LLM. β bot-llm-text π€ Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. β bot-tts-started π€ Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None β bot-tts-stopped π€ Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None β bot-tts-text π€ The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. β Metrics and Monitoring β metrics π€ Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready π bot-ready π€ disconnect-bot π error π€ Transcription user-started-speaking π€ user-stopped-speaking π€ bot-started-speaking π€ bot-stopped-speaking π€ user-transcription π€ bot-transcription π€ Client-Server Messaging server-message π€ client-message π server-response π€ error-response π€ Advanced LLM Interactions append-to-context π llm-function-call π€ llm-function-call-result π bot-llm-search-response π€ Service-Specific Insights bot-llm-started π€ bot-llm-stopped π€ user-llm-text π€ bot-llm-text π€ bot-tts-started π€ bot-tts-stopped π€ bot-tts-text π€ Metrics and Monitoring metrics π€ Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_55865c61.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-enable-prejoin-ui
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL β created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_d35953ef.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-sip-mode
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL β created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_e358e49b.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-enable-chat
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL β created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
deployment_modal_14388797.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/deployment/modal#navigating-your-llm%2C-server%2C-and-pipecat-logs
|
2 |
+
Title: Example: Modal - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Example: Modal - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Deploying your bot Example: Modal Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Modal is well-suited for Pipecat deployments because it handles container orchestration, scaling, and cold starts efficiently. This makes it a good choice for production Pipecat bots that need reliable performance. This guide walks through the Modal example included in the Pipecat repository, which follows the same deployment pattern . Modal example View the complete Modal deployment example in our GitHub repository β Install the Modal CLI Set up Modal Follow Modalβs official instructions for creating an account and setting up the CLI β Deploy a self-serve LLM Deploy Modalβs OpenAI-compatible LLM service: Copy Ask AI git clone https://github.com/modal-labs/modal-examples cd modal-examples modal deploy 06_gpu_and_ml/llm-serving/vllm_inference.py Refer to Modalβs guide and example for Deploying an OpenAI-compatible LLM service with vLLM for more details. Take note of the endpoint URL from the previous step, which will look like: Copy Ask AI https://{your-workspace}--example-vllm-openai-compatible-serve.modal.run Youβll need this for the bot_vllm.py file in the next section. The default Modal LLM example uses Llama-3.1 and will shut down after 15 minutes of inactivity. Cold starts take 5-10 minutes. To prepare the service, we recommend visiting the /docs endpoint ( https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run/docs ) for your deployed LLM and wait for it to fully load before connecting your client. β Deploy FastAPI App and Pipecat pipeline to Modal Setup environment variables: Copy Ask AI cd server cp env.example .env # Modify .env to provide your service API Keys Alternatively, you can configure your Modal app to use secrets . Update the modal_url in server/src/bot_vllm.py to point to the URL you received from the self-serve LLM deployment in the previous step. From within the server directory, test the app locally: Copy Ask AI modal serve app.py Deploy to production: Copy Ask AI modal deploy app.py Note the endpoint URL produced from this deployment. It will look like: Copy Ask AI https:// {your-workspace} --pipecat-modal-fastapi-app.modal.run Youβll need this URL for the clientβs app.js configuration mentioned in its README. β Launch your bots on Modal β Option 1: Direct Link Simply click on the URL displayed after running the server or deploy step to launch an agent and be redirected to a Daily room to talk with the launched bot. This will use the OpenAI pipeline. β Option 2: Connect via an RTVI Client Follow the instructions provided in the client folderβs README for building and running a custom client that connects to your Modal endpoint. The provided client includes a dropdown for choosing which bot pipeline to run. β Navigating your LLM, server, and Pipecat logs On your Modal dashboard , you should have two Apps listed under Live Apps: example-vllm-openai-compatible : This App contains the containers and logs used to run your self-hosted LLM. There will be just one App Function listed: serve . Click on this function to view logs for your LLM. pipecat-modal : This App contains the containers and logs used to run your connect endpoints and Pipecat pipelines. It will list two App Functions: fastapi_app : This function is running the endpoints that your client will interact with and initiate starting a new pipeline ( / , /connect , /status ). Click on this function to see logs for each endpoint hit. bot_runner : This function handles launching and running a bot pipeline. Click on this function to get a list of all pipeline runs and access each runβs logs. β Modal & Pipecat Tips In most other Pipecat examples, we use Popen to launch the pipeline process from the /connect endpoint. In this example, we use a Modal function instead. This allows us to run the pipelines using a separately defined Modal image as well as run each pipeline in an isolated container. For the FastAPI and most common Pipecat Pipeline containers, a default debian_slim CPU-only should be all thatβs required to run. GPU containers are needed for self-hosted services. To minimize cold starts of the pipeline and reduce latency for users, set min_containers=1 on the Modal Function that launches the pipeline to ensure at least one warm instance of your function is always available. β Next steps Explore Modal's LLM Examples For next steps on running a self-hosted LLM and reducing latency, check out all of Modalβs LLM examples Example: Cerebrium On this page Install the Modal CLI Deploy a self-serve LLM Deploy FastAPI App and Pipecat pipeline to Modal Launch your bots on Modal Option 1: Direct Link Option 2: Connect via an RTVI Client Navigating your LLM, server, and Pipecat logs Modal & Pipecat Tips Next steps Assistant Responses are generated using AI and may contain mistakes.
|
deployment_wwwflyio_c55ec17a.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/deployment/www.fly.io#how-it-works
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
features_gemini-multimodal-live_58404a9e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/gemini-multimodal-live#what-we%E2%80%99ll-build
|
2 |
+
Title: Building with Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Building with Gemini Multimodal Live - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Features Building with Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal This guide will walk you through building a real-time AI chatbot using Gemini Multimodal Live and Pipecat. Weβll create a complete application with a Pipecat server and a Pipecat React client that enables natural conversations with an AI assistant. API Reference Gemini Multimodal Live API documentation Example Code Find the complete client and server code in Github Client SDK Pipecat React SDK documentation β What Weβll Build In this guide, youβll create: A FastAPI server that manages bot instances A Gemini-powered conversational AI bot A React client with real-time audio/video A complete pipeline for speech-to-speech interaction β Key Concepts Before we dive into implementation, letβs cover some important concepts that will help you understand how Pipecat and Gemini work together. β Understanding Pipelines At the heart of Pipecat is the pipeline system. A pipeline is a sequence of processors that handle different aspects of the conversation flow. Think of it like an assembly line where each station (processor) performs a specific task. For our chatbot, the pipeline looks like this: Copy Ask AI pipeline = Pipeline([ transport.input(), # Receives audio/video from the user via WebRTC rtvi, # Handles client/server messaging and events context_aggregator.user(), # Manages user message history llm, # Processes speech through Gemini talking_animation, # Controls bot's avatar transport.output(), # Sends audio/video back to the user via WebRTC context_aggregator.assistant(), # Manages bot message history ]) β Processors Each processor in the pipeline handles a specific task: Transport transport.input() and transport.output() handle media streaming with Daily Context context_aggregator maintains conversation history for natural dialogue Speech Processing rtvi_user_transcription and rtvi_bot_transcription handle speech-to-text Animation talking_animation controls the botβs visual state based on speaking activity The order of processors matters! Data flows through the pipeline in sequence, so each processor should receive the data it needs from previous processors. Learn more about the Core Concepts to Pipecat server. β Gemini Integration The GeminiMultimodalLiveLLMService is a speech-to-speech LLM service that interfaces with the Gemini Multimodal Live API. It provides: Real-time speech-to-speech conversation Context management Voice activity detection Tool use Pipecat manages two types of connections: A WebRTC connection between the Pipecat client and server for reliable audio/video streaming A WebSocket connection between the Pipecat server and Gemini for real-time AI processing This architecture ensures stable media streaming while maintaining responsive AI interactions. β Prerequisites Before we begin, youβll need: Python 3.10 or higher Node.js 16 or higher A Daily API key A Google API key with Gemini Multimodal Live access Clone the Pipecat repo: Copy Ask AI git clone [email protected]:pipecat-ai/pipecat.git β Server Implementation Letβs start by setting up the server components. Our server will handle bot management, room creation, and client connections. β Environment Setup Navigate to the simple-chatbotβs server directory: Copy Ask AI cd examples/simple-chatbot/server Set up a python virtual environment: Copy Ask AI python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate Install requirements: Copy Ask AI pip install -r requirements.txt Copy env.example to .env and make a few changes: Copy Ask AI # Remove the hard-coded example room URL DAILY_SAMPLE_ROOM_URL = # Add your Daily and Gemini API keys DAILY_API_KEY = [your key here] GEMINI_API_KEY = [your key here] # Use Gemini implementation BOT_IMPLEMENTATION = gemini β Server Setup (server.py) server.py is a FastAPI server that creates the meeting room where clients and bots interact, manages bot instances, and handles client connections. Itβs the orchestrator that brings everything on the server-side together. β Creating Meeting Room The server uses Dailyβs API via a REST API helper to create rooms where clients and bots can meet. Each room is a secure space for audio/video communication: server/server.py Copy Ask AI async def create_room_and_token (): """Create a Daily room and generate access credentials.""" room = await daily_helpers[ "rest" ].create_room(DailyRoomParams()) token = await daily_helpers[ "rest" ].get_token(room.url) return room.url, token β Managing Bot Instances When a client connects, the server starts a new bot instance configured specifically for that room. It keeps track of running bots and ensures thereβs only one bot per room: server/server.py Copy Ask AI # Start the bot process for a specific room bot_file = "bot-gemini.py" proc = subprocess.Popen([ f "python3 -m { bot_file } -u { room_url } -t { token } " ]) bot_procs[proc.pid] = (proc, room_url) β Connection Endpoints The server provides two ways to connect: Browser Access (/) Creates a room, starts a bot, and redirects the browser to the Daily meeting URL. Perfect for quick testing and development. RTVI Client (/connect) Creates a room, starts a bot, and returns connection credentials. Used by RTVI clients for custom implementations. β Bot Implementation (bot-gemini.py) The bot implementation connects all the pieces: Daily transport, Gemini service, conversation context, and processors. Letβs break down each component: β Transport Setup First, we configure the Daily transport, which handles WebRTC communication between the client and server. server/bot-gemini.py Copy Ask AI transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_in_enabled = True , # Enable audio input audio_out_enabled = True , # Enable audio output video_out_enabled = True , # Enable video output vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) Gemini Multimodal Live audio requirements: Input: 16 kHz sample rate Output: 24 kHz sample rate β Gemini Service Configuration Next, we initialize the Gemini service which will provide speech-to-speech inference and communication: server/bot-gemini.py Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GEMINI_API_KEY" ), voice_id = "Puck" , # Choose your bot's voice params = InputParams( temperature = 0.7 ) # Set model input params ) β Conversation Context We give our bot its personality and initial instructions: server/bot-gemini.py Copy Ask AI messages = [{ "role" : "user" , "content" : """You are Chatbot, a friendly, helpful robot. Keep responses brief and avoid special characters since output will be converted to audio.""" }] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) OpenAILLMContext is used as a common LLM base service for context management. In the future, we may add a specific context manager for Gemini. The context aggregator automatically maintains conversation history, helping the bot remember previous interactions. β Processor Setup We initialize two additional processors in our pipeline to handle different aspects of the interaction: RTVI Processors RTVIProcessor : Handles all client communication events including transcriptions, speaking states, and performance metrics Animation TalkingAnimation : Controls the botβs visual state, switching between static and animated frames based on speaking status Learn more about the RTVI framework and available processors. β Pipeline Assembly Finally, we bring everything together in a pipeline: server/bot-gemini.py Copy Ask AI pipeline = Pipeline([ transport.input(), # Receive media rtvi, # Client UI events context_aggregator.user(), # Process user context llm, # Gemini processing ta, # Animation (talking/quiet states) transport.output(), # Send media context_aggregator.assistant() # Process bot context ]) task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), observers = [RTVIObserver(rtvi)], ) The order of processors is crucial! For example, the RTVI processor should be early in the pipeline to capture all relevant events. The RTVIObserver monitors the entire pipeline and automatically collects relevant events to send to the client. β Client Implementation Our React client uses the Pipecat React SDK to communicate with the bot. Letβs explore how the client connects and interacts with our Pipecat server. β Connection Setup The client needs to connect to our bot server using the same transport type (Daily WebRTC) that we configured on the server: examples/react/src/providers/PipecatProvider.tsx Copy Ask AI const client = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , // Enable audio input enableCam: false , // Disable video input enableScreenShare: false , // Disable screen sharing }); client . connect ({ endpoint: "http://localhost:7860/connect" , // Your bot connection endpoint }); The connection configuration must match your server: DailyTransport : Matches the WebRTC transport used in bot-gemini.py connect endpoint: Matches the /connect route in server.py Media settings: Controls which devices are enabled on join β Media Handling Pipecatβs React components handle all the complex media stream management for you: Copy Ask AI function App () { return ( < PipecatClientProvider client = { client } > < div className = "app" > < PipecatClientVideo participant = "bot" /> { /* Bot's video feed */ } < PipecatClientAudio /> { /* Audio input/output */ } </ div > </ PipecatClientProvider > ); } The PipecatClientProvider is the root component for providing Pipecat client context to your application. By wrapping your PipecatClientAudio and PipecatClientVideo components in this provider, they can access the client instance and receive and process the streams received from the Pipecat server. β Real-time Events The RTVI processors we configured in the pipeline emit events that we can handle in our client: Copy Ask AI // Listen for transcription events useRTVIClientEvent ( RTVIEvent . UserTranscript , ( data : TranscriptData ) => { if ( data . final ) { console . log ( `User said: ${ data . text } ` ); } }); // Listen for bot responses useRTVIClientEvent ( RTVIEvent . BotTranscript , ( data : BotLLMTextData ) => { console . log ( `Bot responded: ${ data . text } ` ); }); Available Events Speaking state changes Transcription updates Bot responses Connection status Performance metrics Event Usage Use these events to: Show speaking indicators Display transcripts Update UI state Monitor performance Optionally, uses callbacks to handle events in your application. Learn more in the Pipecat client docs. β Complete Example Hereβs a basic client implementation with connection status and transcription display: Copy Ask AI function ChatApp () { return ( < PipecatClientProvider client = { client } > < div className = "app" > { /* Connection UI */ } < StatusDisplay /> < ConnectButton /> { /* Media Components */ } < BotVideo /> < PipecatClientAudio /> { /* Debug/Transcript Display */ } < DebugDisplay /> </ div > </ PipecatClientProvider > ); } Check out the example repository for a complete client implementation with styling and error handling. β Running the Application From the simple-chatbot directory, start the server and client to test the chatbot: β 1. Start the Server In one terminal: Copy Ask AI python server/server.py β 2. Start the Client In another terminal: Copy Ask AI cd examples/react npm install npm run dev β 3. Testing the Connection Open http://localhost:5173 in your browser Click βConnectβ to join a room Allow microphone access when prompted Start talking with your AI assistant Troubleshooting: Check that all API keys are properly configured in .env Grant your browser access to your microphone, so it can receive your audio input Verify WebRTC ports arenβt blocked by firewalls β Next Steps Now that you have a working chatbot, consider these enhancements: Add custom avatar animations Implement function calling for external integrations Add support for multiple languages Enhance error recovery and reconnection logic β Examples Foundational Example A basic implementation demonstrating core Gemini Multimodal Live features and transcription capabilities Simple Chatbot A complete client/server implementation showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot β Learn More Gemini Multimodal Live API Reference React Client SDK Documentation Recording Transcripts Metrics On this page What Weβll Build Key Concepts Understanding Pipelines Processors Gemini Integration Prerequisites Server Implementation Environment Setup Server Setup (server.py) Creating Meeting Room Managing Bot Instances Connection Endpoints Bot Implementation (bot-gemini.py) Transport Setup Gemini Service Configuration Conversation Context Processor Setup Pipeline Assembly Client Implementation Connection Setup Media Handling Real-time Events Complete Example Running the Application 1. Start the Server 2. Start the Client 3. Testing the Connection Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|
features_openai-audio-models-and-apis_477b3ad5.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/openai-audio-models-and-apis#realtime-api
|
2 |
+
Title: Building With OpenAI Audio Models and APIs - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Building With OpenAI Audio Models and APIs - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Features Building With OpenAI Audio Models and APIs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal This guide provides an overview of the audio capabilities OpenAI offers via their APIs. Weβll also link to Pipecat sample code. β Two Ways To Build Voice-to-voice You can build voice-to-voice applications in two ways: The cascaded models approach, using separate models for transcription, the LLM, and voice generation. A cascaded pipeline looks like this, in Pipecat code. Hereβs a single-file example that uses a cascaded pipeline . (See below for an overview of Pipecat core concepts.) Copy Ask AI pipeline = Pipeline( [ transport.input(), speech_to_text, context_aggregator.user(), llm, text_to_speech, context_aggregator.assistant(), transport.output(), ] ) Using a single, speech-to-speech model. This is conceptually much simpler. Though note that most applications also need to implement things like function calling, retrieval-augmented search, context management, and integration with existing systems. So the core pipeline is only part of an appβs complexity. Hereβs a speech-to-speech pipeline in Pipecat code. And hereβs a single-file example that uses the OpenAI Realtime API . Copy Ask AI pipeline = Pipeline( [ transport.input(), context_aggregator.user(), speech_to_speech_llm, context_aggregator.assistant(), transport.output(), ] ) Which approach should you choose? The cascaded models approach is preferable if you are implementing a complex workflow and need the best possible instruction following performance and function calling reliability. The gpt-4o model operating in text-to-text mode has the strongest instruction following and function calling performance. The speech-to-speech approach offers better audio understanding and human-like voice output. If your application is primarily free-form, open-ended conversation, these attributes might be more important than instruction following and function calling performance. Note also that gpt-4o-audio-preview and the OpenAI Realtime API are currently beta products. β OpenAI Audio Models and APIs β Transcription API Models: gpt-4o-transcribe , gpt-4o-mini-transcribe Pipecat service: OpenAISTTService ( reference docs ) OpenAI endpoint: /v1/audio/transcriptions ( docs ) β Chat Completions API Models: gpt-4o , gpt-4o-mini , gpt-4o-audio-preview Pipecat service: OpenAILLMService ( reference docs ) OpenAI endpoint: /v1/chat/completions ( docs ) β Realtime API Models: gpt-4o-realtime-preview , gpt-4o-mini-realtime-preview Pipecat service: OpenAIRealtimeBetaLLMService ( reference docs ) OpenAI docs ( overview ) β Speech API Models: gpt-4o-mini-tts Pipecat service: OpenAITTSService ( reference docs ) OpenAI endpoint: /v1/audio/speech ( docs ) β Sample code and starter kits If you have a code example or starter kit you would like this doc to link to, please let us know. We can add examples that help people get started with the OpenAI audio models and APIs. β Single-file examples OpenAI STT β LLM β TTS A complete implementation demonstrating the cascaded approach with OpenAI services OpenAI Realtime API A speech-to-speech implementation using OpenAIβs Realtime API β OpenAI + Twilio + Pipecat Cloud This starter kit is a complete telephone voice agent that can talk about the NCAA March Madness basketball tournaments and look up realtime game information using function calls. The starter kit includes two bot configurations: cascaded model and speech-to-speech. The code can be packaged for deployment to Pipecat Cloud, a commercial platform for Pipecat agent hosting. Noise cancellation with Krisp Pipecat Flows On this page Two Ways To Build Voice-to-voice OpenAI Audio Models and APIs Transcription API Chat Completions API Realtime API Speech API Sample code and starter kits Single-file examples OpenAI + Twilio + Pipecat Cloud Assistant Responses are generated using AI and may contain mistakes.
|
features_pipecat-flows_5892a5c1.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/pipecat-flows#context-strategies
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations β Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence β Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. β When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows β Installation If youβre already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If youβre starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google π‘ Want to design your flows visually? Try the online Flow Editor β Core Concepts β Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. β Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, itβs expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) β Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node β Node Structure Each node in your flow represents a conversation state and consists of three main components: β Messages Nodes use two types of messages to control the conversation: Role Messages : Define the botβs personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each nodeβs purpose. β Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. β Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the functionβs signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } β Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } β Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next nodeβs messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } β Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion β Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. β Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. β Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the nodeβs messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. β Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. β Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasnβt told them yet). While itβs always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. β Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: β Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new nodeβs messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation β Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } β Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current nodeβs purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. β State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Hereβs a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. β LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: β OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] β Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] β Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You donβt need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. β Implementation Approaches β Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. β Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() β Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } β Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. β Example Implementation Hereβs a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) β Best Practices Store shared data in flow_manager.state Create separate functions for node creation β Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. β Visual Design β Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } β Naming Conventions Start Node : Use descriptive names (e.g., βgreetingβ, βwelcomeβ) Flow Nodes : Name based on purpose (e.g., βcollect_infoβ, βverify_dataβ) End Node : Conventionally named βendβ Functions : Use clear, action-oriented names β Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. β Using the Editor β Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node β Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } β Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
|
features_pipecat-flows_94c674f2.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/pipecat-flows#defining-a-function
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations β Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence β Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. β When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows β Installation If youβre already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If youβre starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google π‘ Want to design your flows visually? Try the online Flow Editor β Core Concepts β Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. β Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, itβs expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) β Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node β Node Structure Each node in your flow represents a conversation state and consists of three main components: β Messages Nodes use two types of messages to control the conversation: Role Messages : Define the botβs personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each nodeβs purpose. β Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. β Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the functionβs signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } β Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } β Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next nodeβs messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } β Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion β Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. β Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. β Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the nodeβs messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. β Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. β Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasnβt told them yet). While itβs always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. β Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: β Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new nodeβs messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation β Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } β Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current nodeβs purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. β State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Hereβs a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. β LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: β OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] β Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] β Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You donβt need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. β Implementation Approaches β Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. β Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() β Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } β Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. β Example Implementation Hereβs a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) β Best Practices Store shared data in flow_manager.state Create separate functions for node creation β Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. β Visual Design β Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } β Naming Conventions Start Node : Use descriptive names (e.g., βgreetingβ, βwelcomeβ) Flow Nodes : Name based on purpose (e.g., βcollect_infoβ, βverify_dataβ) End Node : Conventionally named βendβ Functions : Use clear, action-oriented names β Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. β Using the Editor β Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node β Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } β Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
|
filters_frame-filter_3cd853b4.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/frame-filter#notes
|
2 |
+
Title: FrameFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
FrameFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Frame Filters FrameFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview FrameFilter is a processor that filters frames based on their types, only passing through frames that match specified types (plus some system frames like EndFrame and SystemFrame ). β Constructor Parameters β types Tuple[Type[Frame], ...] required Tuple of frame types that should be passed through the filter β Functionality When a frame passes through the filter, it is checked against the provided types. Only frames that match one of the specified types (or are system frames) will be passed downstream. All other frames are dropped. β Output Frames The processor always passes through: Frames matching any of the specified types EndFrame and SystemFrame instances (always allowed, so as to not block the pipeline) β Usage Example Copy Ask AI from pipecat.frames.frames import TextFrame, AudioRawFrame, Frame from pipecat.processors.filters import FrameFilter from typing import Tuple, Type # Create a filter that only passes TextFrames and AudioRawFrames text_and_audio_filter = FrameFilter( types = (TextFrame, AudioRawFrame) ) # Add to pipeline pipeline = Pipeline([ source, text_and_audio_filter, # Filters out all other frame types destination ]) β Frame Flow β Notes Simple but powerful way to restrict which frame types flow through parts of your pipeline Always allows system frames to pass through for proper pipeline operation Can be used to isolate specific parts of your pipeline from certain frame types Efficient implementation with minimal overhead SoundfileMixer FunctionFilter On this page Overview Constructor Parameters Functionality Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
filters_function-filter_0433187d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/function-filter#overview
|
2 |
+
Title: FunctionFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
FunctionFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Frame Filters FunctionFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview FunctionFilter is a flexible processor that uses a custom async function to determine which frames to pass through. This allows for complex, dynamic filtering logic beyond simple type checking. β Constructor Parameters β filter Callable[[Frame], Awaitable[bool]] required Async function that examines each frame and returns True to allow it or False to filter it out β direction FrameDirection default: "FrameDirection.DOWNSTREAM" Which direction of frames to filter (DOWNSTREAM or UPSTREAM) β Functionality When a frame passes through the processor: System frames and end frames are always passed through Frames moving in a different direction than specified are always passed through Other frames are passed to the filter function If the filter function returns True, the frame is passed through β Output Frames The processor conditionally passes through frames based on: Frame type (system frames and end frames always pass) Frame direction (only filters in the specified direction) Result of the custom filter function β Usage Example Copy Ask AI from pipecat.frames.frames import TextFrame, Frame from pipecat.processors.filters import FunctionFilter from pipecat.processors.frame_processor import FrameDirection # Create filter that only allows TextFrames with more than 10 characters async def long_text_filter ( frame : Frame) -> bool : if isinstance (frame, TextFrame): return len (frame.text) > 10 return False # Apply filter to downstream frames only text_length_filter = FunctionFilter( filter = long_text_filter, direction = FrameDirection. DOWNSTREAM ) # Add to pipeline pipeline = Pipeline([ source, text_length_filter, # Filters out short text frames destination ]) β Frame Flow β Notes Provides maximum flexibility for complex filtering logic Can incorporate dynamic conditions that change at runtime Only filters frames moving in the specified direction Always passes through system frames for proper pipeline operation Can be used to create sophisticated content-based filters Supports async filter functions for complex processing FrameFilter IdentityFilter On this page Overview Constructor Parameters Functionality Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
filters_wake-check-filter_be3b0bfa.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/wake-check-filter#param-keepalive-timeout
|
2 |
+
Title: WakeCheckFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
WakeCheckFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Frame Filters WakeCheckFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview WakeCheckFilter monitors TranscriptionFrame s for specified wake phrases and only allows frames to pass through after a wake phrase has been detected. It includes a keepalive timeout to maintain the awake state for a period after detection, allowing continuous conversation without requiring repeated wake phrases. β Constructor Parameters β wake_phrases list[str] required List of wake phrases to detect in transcriptions β keepalive_timeout float default: "3" Number of seconds to remain in the awake state after each transcription β Functionality The filter maintains state for each participant and processes frames as follows: TranscriptionFrame objects are checked for wake phrases If a wake phrase is detected, the filter enters the βAWAKEβ state While in the βAWAKEβ state, all transcription frames pass through After no activity for the keepalive timeout period, the filter returns to βIDLEβ All non-transcription frames pass through normally Wake phrases are detected using regular expressions that match whole words with flexible spacing, making detection resilient to minor transcription variations. β States β IDLE WakeState Default state - only non-transcription frames pass through β AWAKE WakeState Active state after wake phrase detection - all frames pass through β Output Frames All non-transcription frames pass through unchanged After wake phrase detection, transcription frames pass through When awake, transcription frames reset the keepalive timer β Usage Example Copy Ask AI from pipecat.processors.filters import WakeCheckFilter # Create filter with wake phrases wake_filter = WakeCheckFilter( wake_phrases = [ "hey assistant" , "ok computer" , "listen up" ], keepalive_timeout = 5.0 # Stay awake for 5 seconds after each transcription ) # Add to pipeline pipeline = Pipeline([ transport.input(), stt_service, wake_filter, # Only passes transcriptions after wake phrases llm_service, tts_service, transport.output() ]) β Frame Flow β Notes Maintains separate state for each participant ID Uses regex pattern matching for resilient wake phrase detection Accumulates transcription text to detect phrases across multiple frames Trims accumulated text when wake phrase is detected Supports multiple wake phrases Passes all non-transcription frames through unchanged Error handling produces ErrorFrames for robust operation Case-insensitive matching for natural language use STTMuteFilter WakeNotifierFilter On this page Overview Constructor Parameters Functionality States Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
filters_wake-notifier-filter_deec0e70.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/wake-notifier-filter#notes
|
2 |
+
Title: WakeNotifierFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
WakeNotifierFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Frame Filters WakeNotifierFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview WakeNotifierFilter monitors the pipeline for specific frame types and triggers a notification when those frames pass a custom filter condition. It passes all frames through unchanged while performing this notification side-effect. β Constructor Parameters β notifier BaseNotifier required The notifier object to trigger when conditions are met β types Tuple[Type[Frame]] required Tuple of frame types to monitor β filter Callable[[Frame], Awaitable[bool]] required Async function that examines each matching frame and returns True to trigger notification β Functionality The processor operates as follows: Checks if the incoming frame matches any of the specified types If itβs a matching type, calls the filter function with the frame If the filter returns True, triggers the notifier Passes all frames through unchanged, regardless of the filtering result This allows for notification side-effects without modifying the pipelineβs data flow. β Output Frames All frames pass through unchanged in their original direction No frames are modified or filtered out β Usage Example Copy Ask AI from pipecat.frames.frames import TranscriptionFrame, UserStartedSpeakingFrame from pipecat.processors.filters import WakeNotifierFilter from pipecat.sync.event_notifier import EventNotifier # Create an event notifier wake_event = EventNotifier() # Create filter that notifies when certain wake phrases are detected async def wake_phrase_filter ( frame ): if isinstance (frame, TranscriptionFrame): return "hey assistant" in frame.text.lower() return False # Add to pipeline wake_notifier = WakeNotifierFilter( notifier = wake_event, types = (TranscriptionFrame, UserStartedSpeakingFrame), filter = wake_phrase_filter ) # In another component, wait for the notification async def handle_wake_event (): await wake_event.wait() print ( "Wake phrase detected!" ) β Frame Flow β Notes Acts as a transparent pass-through for all frames Can trigger external events without modifying pipeline flow Useful for signaling between pipeline components Can monitor for multiple frame types simultaneously Uses async filter function for complex conditions Functions as a βlistenerβ that doesnβt affect the data stream Can be used for logging, analytics, or coordinating external systems WakeCheckFilter OpenTelemetry On this page Overview Constructor Parameters Functionality Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
flows_pipecat-flows_60eeeb51.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-handler-3
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. β Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows β Core Types β FlowArgs β FlowArgs Dict[str, Any] Type alias for function handler arguments. β FlowResult β FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields β status str Optional status field β error str Optional error message β FlowConfig β FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields β initial_node str required Starting node identifier β nodes Dict[str, NodeConfig] required Map of node names to configurations β NodeConfig β NodeConfig TypedDict Configuration for a single node in the flow. Show Fields β name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" β role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], β task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], β context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) β functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] β pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. βHold on a momentβ¦β), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], β post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] β respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False β Function Handler Types β LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (β οΈ deprecated) A βconsolidatedβ result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) β FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (β οΈ deprecated) A βconsolidatedβ result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) β DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) β ContextStrategy β ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values β APPEND str Default strategy. Adds new messages to existing context. β RESET str Clears context and starts fresh with new messages. β RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. β ContextStrategyConfig β ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields β strategy ContextStrategy required The strategy to use for context management β summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) β FlowsFunctionSchema β FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters β name str required Name of the function β description str required Description of the functionβs purpose β properties Dict[str, Any] required Dictionary defining properties types and descriptions β required List[str] required List of required parameter names β handler Optional[FunctionHandler] Function handler to process the function call β transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a βconsolidatedβ handler that returns a tuple (result, next node). β transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a βconsolidatedβ handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) β FlowManager β FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters β task PipelineTask required Pipeline task for frame queueing β llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. β context_aggregator Any required Context aggregator used for pushing messages to the LLM service β tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. β flow_config Optional[FlowConfig] Optional static flow configuration β context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. β Methods β initialize method Initialize the flow with starting messages. Show Parameters β initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, youβll need to call set_node_from_config() to kick off the conversation. Show Raises β FlowInitializationError If initialization fails β set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer βconsolidatedβ function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters β node_id str required Identifier for the new node β node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises β FlowError If node setup fails β set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer βconsolidatedβ function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters β node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises β FlowError If node setup fails β register_action method Register a handler for a custom action type. Show Parameters β action_type str required String identifier for the action β handler Callable required Async or sync function that handles the action β get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns β messages List[dict] List of messages in the current context Show Raises β FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" β State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 β Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() β Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } β Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } β Function Properties β handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } β transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a βconsolidatedβ handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . β Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If youβre passing your function directly into your NodeConfig rather than as a handler in a function configuration, youβd use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node β Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } β Provider-Specific Formats You donβt need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } β Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. β Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . β tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] β end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] β Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , theyβll run after the bot has finished talking and after any previous post_actions have finished. β function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] β Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Hereβs the complexity: because these actions arenβt queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; theyβll run immediately after the LLM completion is triggered but wonβt wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: β type str required String identifier for the action β handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] β Exceptions β FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) β FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) β FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) β InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
|
flows_pipecat-flows_7d899c19.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-context-strategy-1
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. β Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows β Core Types β FlowArgs β FlowArgs Dict[str, Any] Type alias for function handler arguments. β FlowResult β FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields β status str Optional status field β error str Optional error message β FlowConfig β FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields β initial_node str required Starting node identifier β nodes Dict[str, NodeConfig] required Map of node names to configurations β NodeConfig β NodeConfig TypedDict Configuration for a single node in the flow. Show Fields β name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" β role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], β task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], β context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) β functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] β pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. βHold on a momentβ¦β), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], β post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] β respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False β Function Handler Types β LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (β οΈ deprecated) A βconsolidatedβ result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) β FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (β οΈ deprecated) A βconsolidatedβ result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) β DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) β ContextStrategy β ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values β APPEND str Default strategy. Adds new messages to existing context. β RESET str Clears context and starts fresh with new messages. β RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. β ContextStrategyConfig β ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields β strategy ContextStrategy required The strategy to use for context management β summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) β FlowsFunctionSchema β FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters β name str required Name of the function β description str required Description of the functionβs purpose β properties Dict[str, Any] required Dictionary defining properties types and descriptions β required List[str] required List of required parameter names β handler Optional[FunctionHandler] Function handler to process the function call β transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a βconsolidatedβ handler that returns a tuple (result, next node). β transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a βconsolidatedβ handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) β FlowManager β FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters β task PipelineTask required Pipeline task for frame queueing β llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. β context_aggregator Any required Context aggregator used for pushing messages to the LLM service β tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. β flow_config Optional[FlowConfig] Optional static flow configuration β context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. β Methods β initialize method Initialize the flow with starting messages. Show Parameters β initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, youβll need to call set_node_from_config() to kick off the conversation. Show Raises β FlowInitializationError If initialization fails β set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer βconsolidatedβ function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters β node_id str required Identifier for the new node β node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises β FlowError If node setup fails β set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer βconsolidatedβ function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters β node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises β FlowError If node setup fails β register_action method Register a handler for a custom action type. Show Parameters β action_type str required String identifier for the action β handler Callable required Async or sync function that handles the action β get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns β messages List[dict] List of messages in the current context Show Raises β FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" β State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 β Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() β Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } β Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } β Function Properties β handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } β transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a βconsolidatedβ handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . β Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If youβre passing your function directly into your NodeConfig rather than as a handler in a function configuration, youβd use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node β Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } β Provider-Specific Formats You donβt need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } β Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. β Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . β tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] β end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] β Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , theyβll run after the bot has finished talking and after any previous post_actions have finished. β function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] β Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Hereβs the complexity: because these actions arenβt queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; theyβll run immediately after the LLM completion is triggered but wonβt wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: β type str required String identifier for the action β handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] β Exceptions β FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) β FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) β FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) β InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
|
frame_producer-consumer_0b939df3.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/frame/producer-consumer#param-passthrough
|
2 |
+
Title: Producer & Consumer Processors - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Producer & Consumer Processors - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Advanced Frame Processors Producer & Consumer Processors Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview The Producer and Consumer processors work as a pair to route frames between different parts of a pipeline, particularly useful when working with ParallelPipeline . They allow you to selectively capture frames from one pipeline branch and inject them into another. β ProducerProcessor ProducerProcessor examines frames flowing through the pipeline, applies a filter to decide which frames to share, and optionally transforms these frames before sending them to connected consumers. β Constructor Parameters β filter Callable[[Frame], Awaitable[bool]] required An async function that determines which frames should be sent to consumers. Should return True for frames to be shared. β transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before sending to consumers. By default, passes frames unchanged. β passthrough bool default: "True" When True , passes all frames through the normal pipeline flow. When False , only passes through frames that donβt match the filter. β ConsumerProcessor ConsumerProcessor receives frames from a ProducerProcessor and injects them into its pipeline branch. β Constructor Parameters β producer ProducerProcessor required The producer processor that will send frames to this consumer. β transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before injecting them into the pipeline. β direction FrameDirection default: "FrameDirection.DOWNSTREAM" The direction in which to push received frames. Usually DOWNSTREAM to send frames forward in the pipeline. β Usage Examples β Basic Usage: Moving TTS Audio Between Branches Copy Ask AI # Create a producer that captures TTS audio frames async def is_tts_audio ( frame : Frame) -> bool : return isinstance (frame, TTSAudioRawFrame) # Define an async transformer function async def tts_to_input_audio_transformer ( frame : Frame) -> Frame: if isinstance (frame, TTSAudioRawFrame): # Convert TTS audio to input audio format return InputAudioRawFrame( audio = frame.audio, sample_rate = frame.sample_rate, num_channels = frame.num_channels ) return frame producer = ProducerProcessor( filter = is_tts_audio, transformer = tts_to_input_audio_transformer passthrough = True # Keep these frames in original pipeline ) # Create a consumer to receive the frames consumer = ConsumerProcessor( producer = producer, direction = FrameDirection. DOWNSTREAM ) # Use in a ParallelPipeline pipeline = Pipeline([ transport.input(), ParallelPipeline( # Branch 1: LLM for bot responses [ llm, tts, producer, # Capture TTS audio here ], # Branch 2: Audio processing branch [ consumer, # Receive TTS audio here llm, # Speech-to-Speech LLM (audio in) ] ), transport.output(), ]) Sentry Metrics UserIdleProcessor On this page Overview ProducerProcessor Constructor Parameters ConsumerProcessor Constructor Parameters Usage Examples Basic Usage: Moving TTS Audio Between Branches Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_context-management_81019595.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/context-management#retrieving-current-context
|
2 |
+
Title: Context Management - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Context Management - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Fundamentals Context Management Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal β What is Context in Pipecat? In Pipecat, context refers to the text that the LLM uses to perform an inference. Commonly, this is the text inputted to the LLM and outputted from the LLM. The context consists of a list of alternating user/assistant messages that represents the information you want an LLM to respond to. Since Pipecat is a real-time voice (and multimodal) AI framework, the context serves as the collective history of the entire conversation. β How Context Updates During Conversations After every user and bot turn in the conversation, processors in the pipeline push frames to update the context: STT Service : Pushes TranscriptionFrame objects that represent what the user says. LLM and TTS Services : Work together to represent what the bot says. The LLM streams tokens (as LLMTextFrame s) to the TTS service, which outputs TTSTextFrame s representing the botβs spoken words. β Setting Up Context Management Pipecat includes a context aggregator class that creates and manages context for both user and assistant messages. Hereβs how to set it up: β 1. Create the Context and Context Aggregator Copy Ask AI # Create LLM service llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" )) # Create context context = OpenAILLMContext(messages, tools) # Create context aggregator instance context_aggregator = llm.create_context_aggregator(context) The context (which represents the conversation) is passed to the context aggregator. This ensures that both user and assistant instances of the context aggregators have access to the shared conversation context. β 2. Add Context Aggregators to Your Pipeline Copy Ask AI pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # User context aggregator llm, tts, transport.output(), context_aggregator.assistant(), # Assistant context aggregator ]) β Context Aggregator Placement The placement of context aggregator instances in your pipeline is crucial for proper operation: β User Context Aggregator Place the user context aggregator downstream from the STT service . Since the userβs speech results in TranscriptionFrame objects pushed by the STT service, the user aggregator needs to be positioned to collect these frames. β Assistant Context Aggregator Place the assistant context aggregator after transport.output() . This positioning is important because: The TTS service outputs spoken words in addition to audio The assistant aggregator must be downstream to collect those frames It ensures context updates happen word-by-word for specific services (e.g. Cartesia, ElevenLabs, and Rime) Your context stays updated at the word level in case an interruption occurs Always place the assistant context aggregator after transport.output() to ensure proper word-level context updates during interruptions. β Manually Managing Context You can programmatically add new messages to the context by pushing or queueing specific frames: β Adding Messages LLMMessagesAppendFrame : Appends a new message to the existing context LLMMessagesUpdateFrame : Completely replaces the existing context with new context provided in the frame β Retrieving Current Context The context aggregator provides a get_context_frame() method to obtain the latest context: Copy Ask AI await task.queue_frames([context_aggregator.user().get_context_frame()]) β Triggering Bot Responses Youβll commonly use this manual mechanismβobtaining the current context and pushing/queueing itβto trigger the bot to speak in two scenarios: Starting a pipeline where the bot should speak first After pushing new context frames using LLMMessagesAppendFrame or LLMMessagesUpdateFrame This gives you fine-grained control over when and how the bot responds during the conversation flow. Guides Custom FrameProcessor On this page What is Context in Pipecat? How Context Updates During Conversations Setting Up Context Management 1. Create the Context and Context Aggregator 2. Add Context Aggregators to Your Pipeline Context Aggregator Placement User Context Aggregator Assistant Context Aggregator Manually Managing Context Adding Messages Retrieving Current Context Triggering Bot Responses Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_function-calling_aa2d2f1c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/function-calling#param-tool-call-id
|
2 |
+
Title: Function Calling - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Function Calling - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Fundamentals Function Calling Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal β Understanding Function Calling Function calling (also known as tool calling) allows LLMs to request information from external services and APIs. This enables your bot to access real-time data and perform actions that arenβt part of its training data. For example, you could give your bot the ability to: Check current weather conditions Look up stock prices Query a database Control smart home devices Schedule appointments Hereβs how it works: You define functions the LLM can use and register them to the LLM service used in your pipeline When needed, the LLM requests a function call Your application executes any corresponding functions The result is sent back to the LLM The LLM uses this information in its response β Implementation β 1. Define Functions Pipecat provides a standardized FunctionSchema that works across all supported LLM providers. This makes it easy to define functions once and use them with any provider. As a shorthand, you could also bypass specifying a function configuration at all and instead use βdirectβ functions. Under the hood, these are converted to FunctionSchema s. β Using the Standard Schema (Recommended) Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Define a function using the standard schema weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather in a location" , properties = { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, required = [ "location" , "format" ] ) # Create a tools schema with your functions tools = ToolsSchema( standard_tools = [weather_function]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) The ToolsSchema will be automatically converted to the correct format for your LLM provider through adapters. β Using Direct Functions (Shorthand) You can bypass specifying a function configuration (as a FunctionSchema or in a provider-specific format) and instead pass the function directly to your ToolsSchema . Pipecat will auto-configure the function, gathering relevant metadata from its signature and docstring. Metadata includes: name description properties (including individual property descriptions) list of required properties Note that the function signature is a bit different when using direct functions. The first parameter is FunctionCallParams , followed by any others necessary for the function. Copy Ask AI from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Define a direct function async def get_current_weather ( params : FunctionCallParams, location : str , format : str ): """Get the current weather. Args: location: The city and state, e.g. "San Francisco, CA". format: The temperature unit to use. Must be either "celsius" or "fahrenheit". """ weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Create a tools schema, passing your function directly to it tools = ToolsSchema( standard_tools = [get_current_weather]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) β Using Provider-Specific Formats (Alternative) You can also define functions in the provider-specific format if needed: OpenAI Anthropic Gemini Copy Ask AI from openai.types.chat import ChatCompletionToolParam # OpenAI native format tools = [ ChatCompletionToolParam( type = "function" , function = { "name" : "get_current_weather" , "description" : "Get the current weather" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, "required" : [ "location" , "format" ], }, }, ) ] β Provider-Specific Custom Tools Some providers support unique tools that donβt fit the standard function schema. For these cases, you can add custom tools: Copy Ask AI from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema # Standard functions weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather" , properties = { "location" : { "type" : "string" }}, required = [ "location" ] ) # Custom Gemini search tool gemini_search_tool = { "web_search" : { "description" : "Search the web for information" } } # Create a tools schema with both standard and custom tools tools = ToolsSchema( standard_tools = [weather_function], custom_tools = { AdapterType. GEMINI : [gemini_search_tool] } ) See the provider-specific documentation for details on custom tools and their formats. β 2. Register Function Handlers Register handlers for your functions using one of these LLM service methods : register_function register_direct_function Which one you use depends on whether your function is a βdirectβ function . Non-Direct Function Direct Function Copy Ask AI from pipecat.services.llm_service import FunctionCallParams llm = OpenAILLMService( api_key = "your-api-key" ) # Main function handler - called to execute the function async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data from your API weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Register the function llm.register_function( "get_current_weather" , fetch_weather_from_api, ) β 3. Create the Pipeline Include your LLM service in your pipeline with the registered functions: Copy Ask AI # Initialize the LLM context with your function schemas context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) # Create the context aggregator to collect the user and assistant context context_aggregator = llm.create_context_aggregator(context) # Create the pipeline pipeline = Pipeline([ transport.input(), # Input from the transport stt, # STT processing context_aggregator.user(), # User context aggregation llm, # LLM processing tts, # TTS processing transport.output(), # Output to the transport context_aggregator.assistant(), # Assistant context aggregation ]) β Function Handler Details β FunctionCallParams The FunctionCallParams object contains all the information needed for handling function calls: params : FunctionCallParams function_name : Name of the called function arguments : Arguments passed by the LLM tool_call_id : Unique identifier for the function call llm : Reference to the LLM service context : Current conversation context result_callback : Async function to return results β function_name str Name of the function being called β tool_call_id str Unique identifier for the function call β arguments Mapping[str, Any] Arguments passed by the LLM to the function β llm LLMService Reference to the LLM service that initiated the call β context OpenAILLMContext Current conversation context β result_callback FunctionCallResultCallback Async callback function to return results β Handler Structure Your function handler should: Receive necessary arguments, either: From params.arguments Directly From function arguments, if using direct functions Process data or call external services Return results via params.result_callback(result) Non-Direct Function Direct Function Copy Ask AI async def fetch_weather_from_api ( params : FunctionCallParams): try : # Extract arguments location = params.arguments.get( "location" ) format_type = params.arguments.get( "format" , "celsius" ) # Call external API api_result = await weather_api.get_weather(location, format_type) # Return formatted result await params.result_callback({ "location" : location, "temperature" : api_result[ "temp" ], "conditions" : api_result[ "conditions" ], "unit" : format_type }) except Exception as e: # Handle errors await params.result_callback({ "error" : f "Failed to get weather: { str (e) } " }) β Controlling Function Call Behavior (Advanced) When returning results from a function handler, you can control how the LLM processes those results using a FunctionCallResultProperties object passed to the result callback. It can be handy to skip a completion when you have back-to-back function calls. Note, if you skip a completion, you must manually trigger one from the context. β Properties β run_llm Optional[bool] Controls whether the LLM should generate a response after the function call: True : Run LLM after function call (default if no other function calls in progress) False : Donβt run LLM after function call None : Use default behavior β on_context_updated Optional[Callable[[], Awaitable[None]]] Optional callback that runs after the function result is added to the context β Example Usage Copy Ask AI from pipecat.frames.frames import FunctionCallResultProperties from pipecat.services.llm_service import FunctionCallParams async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data weather_data = { "conditions" : "sunny" , "temperature" : "75" } # Don't run LLM after this function call properties = FunctionCallResultProperties( run_llm = False ) await params.result_callback(weather_data, properties = properties) async def query_database ( params : FunctionCallParams): # Query database results = await db.query(params.arguments[ "query" ]) async def on_update (): await notify_system( "Database query complete" ) # Run LLM after function call and notify when context is updated properties = FunctionCallResultProperties( run_llm = True , on_context_updated = on_update ) await params.result_callback(results, properties = properties) β Next steps Check out the function calling examples to see a complete example for specific LLM providers. Refer to your LLM providerβs documentation to learn more about their function calling capabilities. Ending a Pipeline Muting User Input On this page Understanding Function Calling Implementation 1. Define Functions Using the Standard Schema (Recommended) Using Direct Functions (Shorthand) Using Provider-Specific Formats (Alternative) Provider-Specific Custom Tools 2. Register Function Handlers 3. Create the Pipeline Function Handler Details FunctionCallParams Handler Structure Controlling Function Call Behavior (Advanced) Properties Example Usage Next steps Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_function-calling_aba12231.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/function-calling#2-register-function-handlers
|
2 |
+
Title: Function Calling - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Function Calling - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Fundamentals Function Calling Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal β Understanding Function Calling Function calling (also known as tool calling) allows LLMs to request information from external services and APIs. This enables your bot to access real-time data and perform actions that arenβt part of its training data. For example, you could give your bot the ability to: Check current weather conditions Look up stock prices Query a database Control smart home devices Schedule appointments Hereβs how it works: You define functions the LLM can use and register them to the LLM service used in your pipeline When needed, the LLM requests a function call Your application executes any corresponding functions The result is sent back to the LLM The LLM uses this information in its response β Implementation β 1. Define Functions Pipecat provides a standardized FunctionSchema that works across all supported LLM providers. This makes it easy to define functions once and use them with any provider. As a shorthand, you could also bypass specifying a function configuration at all and instead use βdirectβ functions. Under the hood, these are converted to FunctionSchema s. β Using the Standard Schema (Recommended) Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Define a function using the standard schema weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather in a location" , properties = { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, required = [ "location" , "format" ] ) # Create a tools schema with your functions tools = ToolsSchema( standard_tools = [weather_function]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) The ToolsSchema will be automatically converted to the correct format for your LLM provider through adapters. β Using Direct Functions (Shorthand) You can bypass specifying a function configuration (as a FunctionSchema or in a provider-specific format) and instead pass the function directly to your ToolsSchema . Pipecat will auto-configure the function, gathering relevant metadata from its signature and docstring. Metadata includes: name description properties (including individual property descriptions) list of required properties Note that the function signature is a bit different when using direct functions. The first parameter is FunctionCallParams , followed by any others necessary for the function. Copy Ask AI from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Define a direct function async def get_current_weather ( params : FunctionCallParams, location : str , format : str ): """Get the current weather. Args: location: The city and state, e.g. "San Francisco, CA". format: The temperature unit to use. Must be either "celsius" or "fahrenheit". """ weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Create a tools schema, passing your function directly to it tools = ToolsSchema( standard_tools = [get_current_weather]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) β Using Provider-Specific Formats (Alternative) You can also define functions in the provider-specific format if needed: OpenAI Anthropic Gemini Copy Ask AI from openai.types.chat import ChatCompletionToolParam # OpenAI native format tools = [ ChatCompletionToolParam( type = "function" , function = { "name" : "get_current_weather" , "description" : "Get the current weather" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, "required" : [ "location" , "format" ], }, }, ) ] β Provider-Specific Custom Tools Some providers support unique tools that donβt fit the standard function schema. For these cases, you can add custom tools: Copy Ask AI from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema # Standard functions weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather" , properties = { "location" : { "type" : "string" }}, required = [ "location" ] ) # Custom Gemini search tool gemini_search_tool = { "web_search" : { "description" : "Search the web for information" } } # Create a tools schema with both standard and custom tools tools = ToolsSchema( standard_tools = [weather_function], custom_tools = { AdapterType. GEMINI : [gemini_search_tool] } ) See the provider-specific documentation for details on custom tools and their formats. β 2. Register Function Handlers Register handlers for your functions using one of these LLM service methods : register_function register_direct_function Which one you use depends on whether your function is a βdirectβ function . Non-Direct Function Direct Function Copy Ask AI from pipecat.services.llm_service import FunctionCallParams llm = OpenAILLMService( api_key = "your-api-key" ) # Main function handler - called to execute the function async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data from your API weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Register the function llm.register_function( "get_current_weather" , fetch_weather_from_api, ) β 3. Create the Pipeline Include your LLM service in your pipeline with the registered functions: Copy Ask AI # Initialize the LLM context with your function schemas context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) # Create the context aggregator to collect the user and assistant context context_aggregator = llm.create_context_aggregator(context) # Create the pipeline pipeline = Pipeline([ transport.input(), # Input from the transport stt, # STT processing context_aggregator.user(), # User context aggregation llm, # LLM processing tts, # TTS processing transport.output(), # Output to the transport context_aggregator.assistant(), # Assistant context aggregation ]) β Function Handler Details β FunctionCallParams The FunctionCallParams object contains all the information needed for handling function calls: params : FunctionCallParams function_name : Name of the called function arguments : Arguments passed by the LLM tool_call_id : Unique identifier for the function call llm : Reference to the LLM service context : Current conversation context result_callback : Async function to return results β function_name str Name of the function being called β tool_call_id str Unique identifier for the function call β arguments Mapping[str, Any] Arguments passed by the LLM to the function β llm LLMService Reference to the LLM service that initiated the call β context OpenAILLMContext Current conversation context β result_callback FunctionCallResultCallback Async callback function to return results β Handler Structure Your function handler should: Receive necessary arguments, either: From params.arguments Directly From function arguments, if using direct functions Process data or call external services Return results via params.result_callback(result) Non-Direct Function Direct Function Copy Ask AI async def fetch_weather_from_api ( params : FunctionCallParams): try : # Extract arguments location = params.arguments.get( "location" ) format_type = params.arguments.get( "format" , "celsius" ) # Call external API api_result = await weather_api.get_weather(location, format_type) # Return formatted result await params.result_callback({ "location" : location, "temperature" : api_result[ "temp" ], "conditions" : api_result[ "conditions" ], "unit" : format_type }) except Exception as e: # Handle errors await params.result_callback({ "error" : f "Failed to get weather: { str (e) } " }) β Controlling Function Call Behavior (Advanced) When returning results from a function handler, you can control how the LLM processes those results using a FunctionCallResultProperties object passed to the result callback. It can be handy to skip a completion when you have back-to-back function calls. Note, if you skip a completion, you must manually trigger one from the context. β Properties β run_llm Optional[bool] Controls whether the LLM should generate a response after the function call: True : Run LLM after function call (default if no other function calls in progress) False : Donβt run LLM after function call None : Use default behavior β on_context_updated Optional[Callable[[], Awaitable[None]]] Optional callback that runs after the function result is added to the context β Example Usage Copy Ask AI from pipecat.frames.frames import FunctionCallResultProperties from pipecat.services.llm_service import FunctionCallParams async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data weather_data = { "conditions" : "sunny" , "temperature" : "75" } # Don't run LLM after this function call properties = FunctionCallResultProperties( run_llm = False ) await params.result_callback(weather_data, properties = properties) async def query_database ( params : FunctionCallParams): # Query database results = await db.query(params.arguments[ "query" ]) async def on_update (): await notify_system( "Database query complete" ) # Run LLM after function call and notify when context is updated properties = FunctionCallResultProperties( run_llm = True , on_context_updated = on_update ) await params.result_callback(results, properties = properties) β Next steps Check out the function calling examples to see a complete example for specific LLM providers. Refer to your LLM providerβs documentation to learn more about their function calling capabilities. Ending a Pipeline Muting User Input On this page Understanding Function Calling Implementation 1. Define Functions Using the Standard Schema (Recommended) Using Direct Functions (Shorthand) Using Provider-Specific Formats (Alternative) Provider-Specific Custom Tools 2. Register Function Handlers 3. Create the Pipeline Function Handler Details FunctionCallParams Handler Structure Controlling Function Call Behavior (Advanced) Properties Example Usage Next steps Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_recording-transcripts_b49334dd.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/recording-transcripts#basic-implementation
|
2 |
+
Title: Recording Conversation Transcripts - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Recording Conversation Transcripts - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Fundamentals Recording Conversation Transcripts Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal β Overview Recording transcripts of conversations between users and your bot is useful for debugging, analysis, and creating a record of interactions. Pipecatβs TranscriptProcessor makes it easy to collect both user and bot messages as they occur. β How It Works The TranscriptProcessor collects transcripts by: Capturing what the user says (from TranscriptionFrame s) Capturing what the bot says (from TTSTextFrame s) Emitting events with transcript updates in real-time Allowing you to handle these events with custom logic The TranscriptProcessor provides two separate processors: one for user speech and one for assistant speech. Both emit the same event type when new transcript content is available. β Basic Implementation β Step 1: Create a Transcript Processor First, initialize the transcript processor: Copy Ask AI from pipecat.processors.transcript_processor import TranscriptProcessor # Create a single transcript processor instance transcript = TranscriptProcessor() β Step 2: Add to Your Pipeline Place the processors in your pipeline at the appropriate positions: Copy Ask AI pipeline = Pipeline( [ transport.input(), stt, # Speech-to-text transcript.user(), # Captures user transcripts context_aggregator.user(), llm, tts, # Text-to-speech transport.output(), transcript.assistant(), # Captures assistant transcripts context_aggregator.assistant(), ] ) Place transcript.user() after the STT processor and transcript.assistant() after transport.output() to ensure accurate transcript collection. β Step 3: Handle Transcript Updates Register an event handler to process transcript updates: Copy Ask AI @transcript.event_handler ( "on_transcript_update" ) async def handle_transcript_update ( processor , frame ): # Each message contains role (user/assistant), content, and timestamp for message in frame.messages: print ( f "[ { message.timestamp } ] { message.role } : { message.content } " ) In addition to console logging, you can save transcripts to a database or file for later analysis. β Next Steps Try the Transcript Example Explore a complete working example that demonstrates how to collect and save conversation transcripts with Pipecat. TranscriptProcessor Reference Read the complete API reference documentation for advanced configuration options and event handlers. Consider implementing transcript recording in your application for debugging during development and preserving important conversations in production. The transcript data can also be useful for analyzing conversation patterns and improving your botβs responses over time. Recording Audio Gemini Multimodal Live On this page Overview How It Works Basic Implementation Step 1: Create a Transcript Processor Step 2: Add to Your Pipeline Step 3: Handle Transcript Updates Next Steps Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_user-input-muting_85057656.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/user-input-muting#next-steps
|
2 |
+
Title: User Input Muting with STTMuteFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
User Input Muting with STTMuteFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Fundamentals User Input Muting with STTMuteFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal β Overview In conversational applications, there are moments when you donβt want to process user speech, such as during bot introductions or while executing function calls. Pipecatβs STTMuteFilter lets you selectively βmuteβ user input based on different conversation states. β When to Use STTMuteFilter Common scenarios for muting user input include: During introductions : Prevent the bot from being interrupted during its initial greeting While processing functions : Block input while the bot is retrieving external data During bot speech : Reduce false transcriptions while the bot is speaking For guided conversations : Create more structured interactions with clear turn-taking β How It Works The STTMuteFilter works by blocking specific user-related frames from flowing through your pipeline. When muted, it filters: Voice activity detection (VAD) events Interruption signals Raw audio input frames This prevents the Speech-to-Text service from receiving and processing the userβs speech during muted periods. The filter must be placed between your Transport and STT service in the pipeline to work correctly. β Mute Strategies The STTMuteFilter supports several strategies for determining when to mute user input: FIRST_SPEECH Mute only during the botβs first speech utterance. Useful for introductions when you want the bot to complete its greeting before the user can speak. MUTE_UNTIL_FIRST_BOT_COMPLETE Start muted and remain muted until the first bot utterance completes. Ensures the botβs initial instructions are fully delivered. FUNCTION_CALL Mute during function calls. Prevents users from speaking while the bot is processing external data requests. ALWAYS Mute whenever the bot is speaking. Creates a strict turn-taking conversation pattern. CUSTOM Use custom logic via callback to determine when to mute. Provides maximum flexibility for complex muting rules. The FIRST_SPEECH and MUTE_UNTIL_FIRST_BOT_COMPLETE strategies should not be used together as they handle the first bot speech differently. β Basic Implementation β Step 1: Configure the Filter First, create a configuration for the STTMuteFilter : Copy Ask AI from pipecat.processors.filters.stt_mute_filter import STTMuteConfig, STTMuteFilter, STTMuteStrategy # Configure with one or more strategies stt_mute_processor = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE , STTMuteStrategy. FUNCTION_CALL , } ), ) β Step 2: Add to Your Pipeline Place the filter between your transport input and STT service: Copy Ask AI pipeline = Pipeline( [ transport.input(), # Transport user input stt_mute_processor, # Add the mute processor before STT stt, # Speech-to-text service context_aggregator.user(), # User responses llm, # LLM tts, # Text-to-speech transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) β Best Practices Place the filter correctly : Always position STTMuteFilter between transport input and STT Choose strategies wisely : Select the minimal set of strategies needed for your use case Test user experience : Excessive muting can frustrate users; balance control with usability Consider feedback : Provide visual cues when the user is muted to improve the experience β Next Steps Try the STTMuteFilter Example Explore a complete working example that demonstrates how to use STTMuteFilter to control user input during bot speech and function calls. STTMuteFilter Reference Read the complete API reference documentation for advanced configuration options and muting strategies. Experiment with different muting strategies to find the right balance for your application. For advanced scenarios, try implementing custom muting logic based on specific conversation states or content. Function Calling Recording Audio On this page Overview When to Use STTMuteFilter How It Works Mute Strategies Basic Implementation Step 1: Configure the Filter Step 2: Add to Your Pipeline Best Practices Next Steps Assistant Responses are generated using AI and may contain mistakes.
|
image-generation_fal_340084af.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/image-generation/fal#param-aiohttp-session
|
2 |
+
Title: fal - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
fal - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Image Generation fal Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation fal Google Imagen OpenAI Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview FalImageGenService provides high-speed image generation capabilities using falβs optimized Stable Diffusion XL models. It supports various image sizes, formats, and generation parameters with a focus on fast inference. β Installation To use FalImageGenService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[fal]" Youβll also need to set up your Fal API key as an environment variable: FAL_KEY You can obtain a fal API key by signing up at fal . β Configuration β Constructor Parameters β params InputParams required Generation parameters configuration β aiohttp_session aiohttp.ClientSession required HTTP session for image downloading β model str default: "fal-ai/fast-sdxl" Model identifier β key str Fal API key (alternative to environment variable) β Input Parameters Copy Ask AI class InputParams ( BaseModel ): seed: Optional[ int ] = None # Random seed for reproducibility num_inference_steps: int = 8 # Number of denoising steps num_images: int = 1 # Number of images to generate image_size: Union[ str , Dict[ str , int ]] = "square_hd" # Image dimensions expand_prompt: bool = False # Enhance prompt automatically enable_safety_checker: bool = True # Filter unsafe content format : str = "png" # Output image format β Supported Image Sizes Possible enum values: square_hd , square , portrait_4_3 , portrait_16_9 , landscape_4_3 , landscape_16_9 Note: For custom image sizes, you can pass the width and height as an object: Copy Ask AI { "image_size" : { "width" : 1280 , "height" : 720 } } See the fal docs for more information. β Output Frames β URLImageRawFrame β url string Generated image URL β image bytes Raw image data β size tuple Image dimensions (width, height) β format string Image format (e.g., βPNGβ) β ErrorFrame β error string Error information if generation fails β Methods See the Image Generation base class methods for additional functionality. β Usage Example Copy Ask AI import aiohttp from pipecat.services.fal.image import FalImageGenService # Configure service async with aiohttp.ClientSession() as session: service = FalImageGenService( model = "fal-ai/fast-sdxl" , aiohttp_session = session, params = FalImageGenService.InputParams( num_inference_steps = 8 , image_size = "portrait_hd" , expand_prompt = True ) ) # Use in pipeline pipeline = Pipeline([ prompt_input, # Produces text prompts service, # Generates images image_handler # Handles generated images ]) β Frame Flow β Metrics Support The service collects processing metrics: Generation time Download time API response time Total processing duration β Notes Fast inference times with optimized models Supports various image sizes and formats Automatic prompt enhancement option Built-in safety filtering Asynchronous operation Efficient HTTP session management Comprehensive error handling OpenAI Realtime Beta Google Imagen On this page Overview Installation Configuration Constructor Parameters Input Parameters Supported Image Sizes Output Frames URLImageRawFrame ErrorFrame Methods Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
image-generation_fal_4e43655d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/image-generation/fal#metrics-support
|
2 |
+
Title: fal - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
fal - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Image Generation fal Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation fal Google Imagen OpenAI Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview FalImageGenService provides high-speed image generation capabilities using falβs optimized Stable Diffusion XL models. It supports various image sizes, formats, and generation parameters with a focus on fast inference. β Installation To use FalImageGenService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[fal]" Youβll also need to set up your Fal API key as an environment variable: FAL_KEY You can obtain a fal API key by signing up at fal . β Configuration β Constructor Parameters β params InputParams required Generation parameters configuration β aiohttp_session aiohttp.ClientSession required HTTP session for image downloading β model str default: "fal-ai/fast-sdxl" Model identifier β key str Fal API key (alternative to environment variable) β Input Parameters Copy Ask AI class InputParams ( BaseModel ): seed: Optional[ int ] = None # Random seed for reproducibility num_inference_steps: int = 8 # Number of denoising steps num_images: int = 1 # Number of images to generate image_size: Union[ str , Dict[ str , int ]] = "square_hd" # Image dimensions expand_prompt: bool = False # Enhance prompt automatically enable_safety_checker: bool = True # Filter unsafe content format : str = "png" # Output image format β Supported Image Sizes Possible enum values: square_hd , square , portrait_4_3 , portrait_16_9 , landscape_4_3 , landscape_16_9 Note: For custom image sizes, you can pass the width and height as an object: Copy Ask AI { "image_size" : { "width" : 1280 , "height" : 720 } } See the fal docs for more information. β Output Frames β URLImageRawFrame β url string Generated image URL β image bytes Raw image data β size tuple Image dimensions (width, height) β format string Image format (e.g., βPNGβ) β ErrorFrame β error string Error information if generation fails β Methods See the Image Generation base class methods for additional functionality. β Usage Example Copy Ask AI import aiohttp from pipecat.services.fal.image import FalImageGenService # Configure service async with aiohttp.ClientSession() as session: service = FalImageGenService( model = "fal-ai/fast-sdxl" , aiohttp_session = session, params = FalImageGenService.InputParams( num_inference_steps = 8 , image_size = "portrait_hd" , expand_prompt = True ) ) # Use in pipeline pipeline = Pipeline([ prompt_input, # Produces text prompts service, # Generates images image_handler # Handles generated images ]) β Frame Flow β Metrics Support The service collects processing metrics: Generation time Download time API response time Total processing duration β Notes Fast inference times with optimized models Supports various image sizes and formats Automatic prompt enhancement option Built-in safety filtering Asynchronous operation Efficient HTTP session management Comprehensive error handling OpenAI Realtime Beta Google Imagen On this page Overview Installation Configuration Constructor Parameters Input Parameters Supported Image Sizes Output Frames URLImageRawFrame ErrorFrame Methods Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
|
image-generation_openai_ba6382a9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/image-generation/openai#param-api-key
|
2 |
+
Title: OpenAI Image Generation - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
OpenAI Image Generation - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Image Generation OpenAI Image Generation Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation fal Google Imagen OpenAI Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview OpenAIImageGenService provides high-quality image generation capabilities using OpenAIβs DALL-E models. It transforms text prompts into images with various size options and model configurations. β Installation No additional installation is required for the OpenAIImageGenService as it is part of the Pipecat AI package. Youβll also need an OpenAI API key for authentication. β Configuration β Constructor Parameters β api_key str required OpenAI API key for authentication β base_url str default: "None" Optional base URL for OpenAI API requests β aiohttp_session aiohttp.ClientSession required HTTP session for making requests β image_size str required Image dimensions - one of β256x256β, β512x512β, β1024x1024β, β1792x1024β, β1024x1792β β model str default: "dall-e-3" OpenAI model identifier for image generation β Input The service accepts text prompts through its image generation pipeline. β Output Frames β URLImageRawFrame β url string Generated image URL from OpenAI β image bytes Raw image data β size tuple Image dimensions (width, height) β format string Image format (e.g., βJPEGβ) β ErrorFrame β error string Error information if generation fails β Usage Example Copy Ask AI import aiohttp from pipecat.pipeline.pipeline import Pipeline from pipecat.services.openai.image import OpenAIImageGenService # Create an aiohttp session aiohttp_session = aiohttp.ClientSession() # Configure service image_gen = OpenAIImageGenService( api_key = "your-openai-api-key" , aiohttp_session = aiohttp_session, image_size = "1024x1024" , model = "dall-e-3" ) # Use in pipeline main_pipeline = Pipeline( [ transport.input(), context_aggregator.user(), llm_service, image_gen, tts_service, transport.output(), context_aggregator.assistant(), ] ) β Frame Flow β Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Processing duration API response metrics β Model Support OpenAIβs image generation service offers different model variants: Model ID Description dall-e-3 Latest DALL-E model with higher quality and better prompt following dall-e-2 Previous generation model with good quality and lower cost β Image Size Options Size Option Aspect Ratio Description 256x256 1:1 Small square image 512x512 1:1 Medium square image 1024x1024 1:1 Large square image 1792x1024 16:9 Horizontal/landscape orientation 1024x1792 9:16 Vertical/portrait orientation β Error Handling Copy Ask AI try : async for frame in image_gen.run_image_gen(prompt): if isinstance (frame, ErrorFrame): logger.error( f "Image generation error: { frame.error } " ) else : # Process successful image generation pass except Exception as e: logger.error( f "Unexpected error during image generation: { e } " ) Google Imagen Simli On this page Overview Installation Configuration Constructor Parameters Input Output Frames URLImageRawFrame ErrorFrame Usage Example Frame Flow Metrics Support Model Support Image Size Options Error Handling Assistant Responses are generated using AI and may contain mistakes.
|
ios_introduction_2aa11c8e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/ios/introduction#example
|
2 |
+
Title: SDK Introduction - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SDK Introduction - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation iOS SDK SDK Introduction Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat iOS SDK provides a Swift implementation for building voice and multimodal AI applications on iOS. It handles: Real-time audio streaming Bot communication and state management Media device handling Configuration management Event handling β Installation Add the SDK to your project using Swift Package Manager: Copy Ask AI // Core SDK . package ( url : "https://github.com/pipecat-ai/pipecat-client-ios.git" , from : "0.3.0" ), // Daily transport implementation . package ( url : "https://github.com/pipecat-ai/pipecat-client-ios-daily.git" , from : "0.3.0" ), Then add the dependencies to your target: Copy Ask AI . target ( name : "YourApp" , dependencies : [ . product ( name : "PipecatClientIOS" , package : "pipecat-client-ios" ) . product ( name : "PipecatClientIOSDaily" , package : "pipecat-client-ios-daily" ) ]), β Example Hereβs a simple example using Daily as the transport layer: Copy Ask AI import PipecatClientIOS import PipecatClientIOSDaily let clientConfig = [ ServiceConfig ( service : "llm" , options : [ Option ( name : "model" , value : . string ( "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo" )), Option ( name : "messages" , value : . array ([ . object ([ "role" : . string ( "system" ), "content" : . string ( "You are a helpful assistant." ) ]) ])) ] ), ServiceConfig ( service : "tts" , options : [ Option ( name : "voice" , value : . string ( "79a125e8-cd45-4c13-8a67-188112f4dd22" )) ] ) ] let options = RTVIClientOptions. init ( enableMic : true , params : RTVIClientParams ( baseUrl : $PIPECAT_API_URL, config : clientConfig ) ) let client = RTVIClient. init ( transport : DailyTransport. init ( options : configOptions), options : configOptions ) try await client. start () β Documentation API Reference Complete SDK API documentation Source Pipecat Client iOS Demo Simple Chatbot Demo Daily Transport WebRTC implementation using Daily API Reference API Reference On this page Installation Example Documentation Assistant Responses are generated using AI and may contain mistakes.
|
llm_gemini_6ea32a78.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/gemini#input
|
2 |
+
Title: Google Gemini - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Google Gemini - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation LLM Google Gemini Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview GoogleLLMService provides integration with Googleβs Gemini models, supporting streaming responses, function calling, and multimodal inputs. It includes specialized context handling for Googleβs message format while maintaining compatibility with OpenAI-style contexts. API Reference Complete API documentation and method details Gemini Docs Official Google Gemini API documentation and features Example Code Working example with function calling β Installation To use GoogleLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" Youβll also need to set up your Google API key as an environment variable: GOOGLE_API_KEY . Get your API key from Google AI Studio . β Frames β Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates β Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks LLMSearchResponseFrame - Search grounding results with citations FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors β Search Grounding Google Geminiβs search grounding feature enables real-time web search integration, allowing the model to access current information and provide citations. This is particularly valuable for applications requiring up-to-date information. β Enabling Search Grounding Copy Ask AI # Configure search grounding tool search_tool = { "google_search_retrieval" : { "dynamic_retrieval_config" : { "mode" : "MODE_DYNAMIC" , "dynamic_threshold" : 0.3 , # Lower = more frequent grounding } } } # Initialize with search grounding llm = GoogleLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), model = "gemini-1.5-flash-002" , system_instruction = "You are a helpful assistant with access to current information." , tools = [search_tool] ) β Handling Search Results Search grounding produces LLMSearchResponseFrame with detailed citation information: Copy Ask AI @pipeline.event_handler ( "llm_search_response" ) async def handle_search_response ( frame ): print ( f "Search result: { frame.search_result } " ) print ( f "Sources: { len (frame.origins) } citations" ) for origin in frame.origins: print ( f "- { origin[ 'site_title' ] } : { origin[ 'site_uri' ] } " ) β Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. β Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. β Usage Example Copy Ask AI import os from pipecat.services.google.llm import GoogleLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Gemini service with search grounding search_tool = { "google_search_retrieval" : { "dynamic_retrieval_config" : { "mode" : "MODE_DYNAMIC" , "dynamic_threshold" : 0.3 } } } llm = GoogleLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), model = "gemini-2.0-flash" , system_instruction = """You are a helpful assistant with access to current information. When users ask about recent events, use search to provide accurate, up-to-date information.""" , tools = [search_tool], params = GoogleLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" } }, required = [ "location" ] ) # Define image capture function for multimodal capabilities image_function = FunctionSchema( name = "get_image" , description = "Capture and analyze an image from the video stream" , properties = { "question" : { "type" : "string" , "description" : "Question about what to analyze in the image" } }, required = [ "question" ] ) tools = ToolsSchema( standard_tools = [weather_function, image_function]) # Create context with multimodal system prompt context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant with access to current information and vision capabilities. You can answer questions about weather, analyze images from video streams, and search for current information. Keep responses concise for voice output.""" }, { "role" : "user" , "content" : "Hello! What can you help me with?" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handlers async def get_weather ( params ): location = params.arguments[ "location" ] await params.result_callback( f "Weather in { location } : 72Β°F and sunny" ) async def get_image ( params ): question = params.arguments[ "question" ] # Request image from video stream await params.llm.request_image_frame( user_id = client_id, function_name = params.function_name, tool_call_id = params.tool_call_id, text_content = question ) await params.result_callback( f "Analyzing image for: { question } " ) llm.register_function( "get_weather" , get_weather) llm.register_function( "get_image" , get_image) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) β Metrics Google Gemini provides comprehensive usage tracking: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) β Additional Notes Multimodal Capabilities : Native support for text, images, audio, and video processing Search Grounding : Real-time web search with automatic citation and source attribution System Instructions : Handle system messages differently than OpenAI - set during initialization Vision Functions : Built-in support for image capture and analysis from video streams Fireworks AI Google Vertex AI On this page Overview Installation Frames Input Output Search Grounding Enabling Search Grounding Handling Search Results Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_grok_01e8e47f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/grok#additional-notes
|
2 |
+
Title: Grok - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Grok - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation LLM Grok Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview GrokLLMService provides access to Grokβs language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Grok Docs Official Grok API documentation and features Example Code Working example with function calling β Installation To use GrokLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[grok]" Youβll also need to set up your Grok API key as an environment variable: GROK_API_KEY . Get your API key from X.AI Console . β Frames β Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates β Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors β Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. β Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. β Usage Example Copy Ask AI import os from pipecat.services.grok.llm import GrokLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Grok service llm = GrokLLMService( api_key = os.getenv( "GROK_API_KEY" ), model = "grok-3-beta" , params = GrokLLMService.InputParams( temperature = 0.8 , # Higher for creative responses max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice interaction context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful and creative assistant in a voice conversation. Your output will be converted to audio, so avoid special characters. Respond in an engaging and helpful way while being succinct.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75Β°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) β Metrics Inherits all OpenAI metrics capabilities with specialized token tracking: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Accumulated prompt tokens, completion tokens, and totals Grok uses incremental token reporting, so metrics are accumulated and reported at the end of each response. Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) β Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Real-time Information : Access to current events and up-to-date information Vision Capabilities : Image understanding and analysis with grok-2-vision model Google Vertex AI Groq On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_grok_fddf9bc0.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/grok#context-management
|
2 |
+
Title: Grok - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Grok - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation LLM Grok Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview GrokLLMService provides access to Grokβs language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Grok Docs Official Grok API documentation and features Example Code Working example with function calling β Installation To use GrokLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[grok]" Youβll also need to set up your Grok API key as an environment variable: GROK_API_KEY . Get your API key from X.AI Console . β Frames β Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates β Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors β Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. β Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. β Usage Example Copy Ask AI import os from pipecat.services.grok.llm import GrokLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Grok service llm = GrokLLMService( api_key = os.getenv( "GROK_API_KEY" ), model = "grok-3-beta" , params = GrokLLMService.InputParams( temperature = 0.8 , # Higher for creative responses max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice interaction context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful and creative assistant in a voice conversation. Your output will be converted to audio, so avoid special characters. Respond in an engaging and helpful way while being succinct.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75Β°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) β Metrics Inherits all OpenAI metrics capabilities with specialized token tracking: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Accumulated prompt tokens, completion tokens, and totals Grok uses incremental token reporting, so metrics are accumulated and reported at the end of each response. Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) β Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Real-time Information : Access to current events and up-to-date information Vision Capabilities : Image understanding and analysis with grok-2-vision model Google Vertex AI Groq On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_ollama_015de6b2.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/ollama#input
|
2 |
+
Title: Ollama - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Ollama - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation LLM Ollama Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview OLLamaLLMService provides access to locally-run Ollama models through an OpenAI-compatible interface. It inherits from BaseOpenAILLMService and allows you to run various open-source models locally while maintaining compatibility with OpenAIβs API format. API Reference Complete API documentation and method details Ollama Docs Official Ollama documentation and model library Download Ollama Download and setup instructions for Ollama β Installation To use Ollama services, you need to install both Ollama and the Pipecat dependency: Install Ollama on your system from ollama.com/download Install Pipecat dependency : Copy Ask AI pip install "pipecat-ai[ollama]" Pull a model (first time only): Copy Ask AI ollama pull llama2 Ollama runs as a local service on port 11434. No API key required! β Frames β Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision models LLMUpdateSettingsFrame - Runtime parameter updates β Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - Connection or processing errors β Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. β Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. β Usage Example Copy Ask AI from pipecat.services.ollama.llm import OLLamaLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure local Ollama service llm = OLLamaLLMService( model = "llama3.1" , # Must be pulled first: ollama pull llama3.1 base_url = "http://localhost:11434/v1" , # Default Ollama endpoint params = OLLamaLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for local processing weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for local model context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant running locally. Be concise and efficient in your responses while maintaining helpfulness.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler - all processing stays local async def fetch_weather ( params ): location = params.arguments[ "location" ] # Local weather lookup or cached data await params.result_callback({ "conditions" : "sunny" , "temperature" : "22Β°C" }) llm.register_function( "get_current_weather" , fetch_weather) # Use in pipeline - completely offline capable pipeline = Pipeline([ transport.input(), stt, # Can use local STT too context_aggregator.user(), llm, # All inference happens locally tts, # Can use local TTS too transport.output(), context_aggregator.assistant() ]) β Metrics Inherits all OpenAI metrics capabilities for local monitoring: Time to First Byte (TTFB) - Local inference latency Processing Duration - Model execution time Token Usage - Local token counting (if supported by model) Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) β Additional Notes Run models locally : Ollama allows you to run various open-source models on your own hardware, providing flexibility and control. OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Privacy centric : All processing happens locally, ensuring data privacy and security. NVIDIA NIM OpenAI On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_openai_cfc7f2e1.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/openai
|
2 |
+
Title: OpenAI - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
OpenAI - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation LLM OpenAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview OpenAILLMService provides chat completion capabilities using OpenAIβs API, supporting streaming responses, function calling, vision input, and advanced context management for conversational AI applications. API Reference Complete API documentation and method details OpenAI Docs Official OpenAI API documentation Example Code Function calling example with weather API β Installation To use OpenAI services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" Youβll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . Get your API key from the OpenAI Platform . β Frames β Input OpenAILLMContextFrame - OpenAI-specific conversation context LLMMessagesFrame - Standard conversation messages VisionImageRawFrame - Images for vision model processing LLMUpdateSettingsFrame - Runtime model configuration updates β Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors β Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. β Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. β Usage Example β Basic Conversation with Function Calling Copy Ask AI import os from pipecat.services.openai.llm import OpenAILLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Configure the service llm = OpenAILLMService( model = "gpt-4o" , api_key = os.getenv( "OPENAI_API_KEY" ), params = OpenAILLMService.InputParams( temperature = 0.7 , ) ) # Define function schema weather_function = FunctionSchema( name = "get_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City name" } }, required = [ "location" ] ) # Create tools and context tools = ToolsSchema( standard_tools = [weather_function]) context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant. Keep responses concise." }], tools = tools ) # Register function handler async def get_weather_handler ( params : FunctionCallParams): location = params.arguments.get( "location" ) # Call weather API here... weather_data = { "temperature" : "75Β°F" , "conditions" : "sunny" } await params.result_callback(weather_data) llm.register_function( "get_weather" , get_weather_handler) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # Handles user messages llm, # Processes with OpenAI tts, transport.output(), context_aggregator.assistant() # Captures responses ]) β Metrics The service provides: Time to First Byte (TTFB) - Latency from request to first response token Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and total usage Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) β Additional Notes Streaming Responses : All responses are streamed for low latency Context Persistence : Use context aggregators to maintain conversation history Error Handling : Automatic retry logic for rate limits and transient errors Compatible Services : Works with OpenAI-compatible APIs by setting base_url Ollama OpenPipe On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Basic Conversation with Function Calling Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
llm_perplexity_5fcd95f8.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/llm/perplexity#additional-notes
|
2 |
+
Title: Perplexity - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Perplexity - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation LLM Perplexity Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview PerplexityLLMService provides access to Perplexityβs language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses and context management, with special handling for Perplexityβs incremental token reporting. API Reference Complete API documentation and method details Perplexity Docs Official Perplexity API documentation and features Example Code Working example with search capabilities Unlike other LLM services, Perplexity does not support function calling. Instead, they offer native internet search built in without requiring special function calls. β Installation To use PerplexityLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[perplexity]" Youβll also need to set up your Perplexity API key as an environment variable: PERPLEXITY_API_KEY . Get your API key from Perplexity API . β Frames β Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list LLMUpdateSettingsFrame - Runtime parameter updates β Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks with citations ErrorFrame - API or processing errors β Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. β Usage Example Copy Ask AI import os from pipecat.services.perplexity.llm import PerplexityLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext # Configure Perplexity service llm = PerplexityLLMService( api_key = os.getenv( "PERPLEXITY_API_KEY" ), model = "sonar-pro" , # Pro model for enhanced capabilities params = PerplexityLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Create context optimized for search and current information context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a knowledgeable assistant with access to real-time information. When answering questions, use your search capabilities to provide current, accurate information. Always cite your sources when possible. Keep responses concise for voice output.""" } ] ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Use in pipeline for information-rich conversations pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, # Will automatically search and cite sources tts, transport.output(), context_aggregator.assistant() ]) # Enable metrics with special TTFB reporting for Perplexity task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True , report_only_initial_ttfb = True , # Optimized for Perplexity's response pattern ) ) β Metrics The service provides specialized token tracking for Perplexityβs incremental reporting: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Accumulated prompt and completion tokens Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True , ) ) β Additional Notes No Function Calling : Perplexity doesnβt support traditional function calling but provides superior built-in search Real-time Data : Access to current information without complex function orchestration Source Citations : Automatic citation of web sources in responses OpenAI Compatible : Uses familiar OpenAI-style interface and parameters OpenRouter Qwen On this page Overview Installation Frames Input Output Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
|
observers_turn-tracking-observer_c6e5fbb9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/observers/turn-tracking-observer#turn-lifecycle
|
2 |
+
Title: Turn Tracking Observer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Turn Tracking Observer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Observers Turn Tracking Observer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Observer Pattern Debug Observer LLM Observer Transcription Observer Turn Tracking Observer Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The TurnTrackingObserver monitors and tracks conversational turns in your Pipecat pipeline, providing events when turns start and end. It intelligently identifies when a user-bot interaction cycle begins and completes. β Turn Lifecycle A turn represents a complete user-bot interaction cycle: Start : When the user starts speaking (or pipeline starts for first turn) Processing : User speaks, bot processes and responds End : After the bot finishes speaking and either: The user starts speaking again A timeout period elapses with no further activity β Events The observer emits two main events: on_turn_started : When a new turn begins Parameters: turn_number (int) on_turn_ended : When a turn completes Parameters: turn_number (int), duration (float, in seconds), was_interrupted (bool) β Usage The observer is automatically created when you initialize a PipelineTask with enable_turn_tracking=True (which is the default): Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True ), # Turn tracking is enabled by default ) # Access the observer turn_observer = task.turn_tracking_observer # Register event handlers @turn_observer.event_handler ( "on_turn_started" ) async def on_turn_started ( observer , turn_number ): logger.info( f "Turn { turn_number } started" ) @turn_observer.event_handler ( "on_turn_ended" ) async def on_turn_ended ( observer , turn_number , duration , was_interrupted ): status = "interrupted" if was_interrupted else "completed" logger.info( f "Turn { turn_number } { status } in { duration :.2f} s" ) β Configuration You can configure the observerβs behavior when creating a PipelineTask : Copy Ask AI from pipecat.observers.turn_tracking_observer import TurnTrackingObserver # Create a custom observer instance custom_turn_tracker = TurnTrackingObserver( turn_end_timeout_secs = 3.5 , # Turn end timeout (default: 2.5) ) # Add it as a regular observer task = PipelineTask( pipeline, observers = [custom_turn_tracker], # Disable the default one if adding your own enable_turn_tracking = False , ) β Interruptions The observer automatically detects interruptions when the user starts speaking while the bot is still speaking. In this case: The current turn is marked as interrupted ( was_interrupted=True ) A new turn begins immediately β How It Works The observer monitors specific frame types to track conversation flow: StartFrame : Initiates the first turn UserStartedSpeakingFrame : Starts user speech or triggers a new turn BotStartedSpeakingFrame : Marks bot speech beginning BotStoppedSpeakingFrame : Starts the turn end timeout After a bot stops speaking, the observer waits for the configured timeout period. If no further bot speech occurs, the turn ends; otherwise, it continues as part of the same turn. β Use Cases Analytics : Measure turn durations, interruption rates, and conversation flow Logging : Record turn-based logs for diagnostics and analysis Visualization : Show turn-based conversation timelines in UIs Tracing : Group spans and metrics by conversation turns Transcription Observer Daily REST Helper On this page Turn Lifecycle Events Usage Configuration Interruptions How It Works Use Cases Assistant Responses are generated using AI and may contain mistakes.
|
pipecat-transport-openai-realtime-webrtc_indexhtml_aa5b4452.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/android/pipecat-transport-openai-realtime-webrtc/index.html
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
pipeline_pipeline-task_23275744.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/pipeline/pipeline-task#param-additional-span-attributes
|
2 |
+
Title: PipelineTask - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PipelineTask - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Pipeline PipelineTask Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview PipelineTask is the central class for managing pipeline execution. It handles the lifecycle of the pipeline, processes frames in both directions, manages task cancellation, and provides event handlers for monitoring pipeline activity. β Basic Usage Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask # Create a pipeline pipeline = Pipeline([ ... ]) # Create a task with the pipeline task = PipelineTask(pipeline) # Queue frames for processing await task.queue_frame(TTSSpeakFrame( "Hello, how can I help you today?" )) # Run the pipeline runner = PipelineRunner() await runner.run(task) β Constructor Parameters β pipeline BasePipeline required The pipeline to execute. β params PipelineParams default: "PipelineParams()" Configuration parameters for the pipeline. See PipelineParams for details. β observers List[BaseObserver] default: "[]" List of observers for monitoring pipeline execution. See Observers for details. β clock BaseClock default: "SystemClock()" Clock implementation for timing operations. β task_manager Optional[BaseTaskManager] default: "None" Custom task manager for handling asyncio tasks. If None, a default TaskManager is used. β check_dangling_tasks bool default: "True" Whether to check for processorsβ tasks finishing properly. β idle_timeout_secs Optional[float] default: "300" Timeout in seconds before considering the pipeline idle. Set to None to disable idle detection. See Pipeline Idle Detection for details. β idle_timeout_frames Tuple[Type[Frame], ...] default: "(BotSpeakingFrame, LLMFullResponseEndFrame)" Frame types that should prevent the pipeline from being considered idle. See Pipeline Idle Detection for details. β cancel_on_idle_timeout bool default: "True" Whether to automatically cancel the pipeline task when idle timeout is reached. See Pipeline Idle Detection for details. β enable_tracing bool default: "False" Whether to enable OpenTelemetry tracing. See The OpenTelemetry guide for details. β enable_turn_tracking bool default: "False" Whether to enable turn tracking. See The OpenTelemetry guide for details. β conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated. See The OpenTelemetry guide for details. β additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. See The OpenTelemetry guide for details. β Methods β Task Lifecycle Management β run() async Starts and manages the pipeline execution until completion or cancellation. Copy Ask AI await task.run() β stop_when_done() async Sends an EndFrame to the pipeline to gracefully stop the task after all queued frames have been processed. Copy Ask AI await task.stop_when_done() β cancel() async Stops the running pipeline immediately by sending a CancelFrame. Copy Ask AI await task.cancel() β has_finished() bool Returns whether the task has finished (all processors have stopped). Copy Ask AI if task.has_finished(): print ( "Task is complete" ) β Frame Management β queue_frame() async Queues a single frame to be pushed down the pipeline. Copy Ask AI await task.queue_frame(TTSSpeakFrame( "Hello!" )) β queue_frames() async Queues multiple frames to be pushed down the pipeline. Copy Ask AI frames = [TTSSpeakFrame( "Hello!" ), TTSSpeakFrame( "How are you?" )] await task.queue_frames(frames) β Event Handlers PipelineTask provides an event handler that can be registered using the event_handler decorator: β on_idle_timeout Triggered when no activity frames (as specified by idle_timeout_frames ) have been received within the idle timeout period. Copy Ask AI @task.event_handler ( "on_idle_timeout" ) async def on_idle_timeout ( task ): print ( "Pipeline has been idle too long" ) await task.queue_frame(TTSSpeakFrame( "Are you still there?" )) PipelineParams Pipeline Idle Detection On this page Overview Basic Usage Constructor Parameters Methods Task Lifecycle Management Frame Management Event Handlers on_idle_timeout Assistant Responses are generated using AI and may contain mistakes.
|
pipeline_pipeline-task_3534b9ca.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/pipeline/pipeline-task#overview
|
2 |
+
Title: PipelineTask - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
PipelineTask - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Pipeline PipelineTask Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview PipelineTask is the central class for managing pipeline execution. It handles the lifecycle of the pipeline, processes frames in both directions, manages task cancellation, and provides event handlers for monitoring pipeline activity. β Basic Usage Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask # Create a pipeline pipeline = Pipeline([ ... ]) # Create a task with the pipeline task = PipelineTask(pipeline) # Queue frames for processing await task.queue_frame(TTSSpeakFrame( "Hello, how can I help you today?" )) # Run the pipeline runner = PipelineRunner() await runner.run(task) β Constructor Parameters β pipeline BasePipeline required The pipeline to execute. β params PipelineParams default: "PipelineParams()" Configuration parameters for the pipeline. See PipelineParams for details. β observers List[BaseObserver] default: "[]" List of observers for monitoring pipeline execution. See Observers for details. β clock BaseClock default: "SystemClock()" Clock implementation for timing operations. β task_manager Optional[BaseTaskManager] default: "None" Custom task manager for handling asyncio tasks. If None, a default TaskManager is used. β check_dangling_tasks bool default: "True" Whether to check for processorsβ tasks finishing properly. β idle_timeout_secs Optional[float] default: "300" Timeout in seconds before considering the pipeline idle. Set to None to disable idle detection. See Pipeline Idle Detection for details. β idle_timeout_frames Tuple[Type[Frame], ...] default: "(BotSpeakingFrame, LLMFullResponseEndFrame)" Frame types that should prevent the pipeline from being considered idle. See Pipeline Idle Detection for details. β cancel_on_idle_timeout bool default: "True" Whether to automatically cancel the pipeline task when idle timeout is reached. See Pipeline Idle Detection for details. β enable_tracing bool default: "False" Whether to enable OpenTelemetry tracing. See The OpenTelemetry guide for details. β enable_turn_tracking bool default: "False" Whether to enable turn tracking. See The OpenTelemetry guide for details. β conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated. See The OpenTelemetry guide for details. β additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. See The OpenTelemetry guide for details. β Methods β Task Lifecycle Management β run() async Starts and manages the pipeline execution until completion or cancellation. Copy Ask AI await task.run() β stop_when_done() async Sends an EndFrame to the pipeline to gracefully stop the task after all queued frames have been processed. Copy Ask AI await task.stop_when_done() β cancel() async Stops the running pipeline immediately by sending a CancelFrame. Copy Ask AI await task.cancel() β has_finished() bool Returns whether the task has finished (all processors have stopped). Copy Ask AI if task.has_finished(): print ( "Task is complete" ) β Frame Management β queue_frame() async Queues a single frame to be pushed down the pipeline. Copy Ask AI await task.queue_frame(TTSSpeakFrame( "Hello!" )) β queue_frames() async Queues multiple frames to be pushed down the pipeline. Copy Ask AI frames = [TTSSpeakFrame( "Hello!" ), TTSSpeakFrame( "How are you?" )] await task.queue_frames(frames) β Event Handlers PipelineTask provides an event handler that can be registered using the event_handler decorator: β on_idle_timeout Triggered when no activity frames (as specified by idle_timeout_frames ) have been received within the idle timeout period. Copy Ask AI @task.event_handler ( "on_idle_timeout" ) async def on_idle_timeout ( task ): print ( "Pipeline has been idle too long" ) await task.queue_frame(TTSSpeakFrame( "Are you still there?" )) PipelineParams Pipeline Idle Detection On this page Overview Basic Usage Constructor Parameters Methods Task Lifecycle Management Frame Management Event Handlers on_idle_timeout Assistant Responses are generated using AI and may contain mistakes.
|
react_components_0efbf3cd.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/react/components#param-children
|
2 |
+
Title: Components - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Components - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation API Reference Components Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference Components Hooks React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides several components for handling audio, video, and visualization in your application. β PipecatClientProvider The root component for providing Pipecat client context to your application. Copy Ask AI < PipecatClientProvider client = { pcClient } > { /* Child components */ } </ PipecatClientProvider > Props β client PipecatClient required A singleton instance of PipecatClient β PipecatClientAudio Creates a new <audio> element that mounts the botβs audio track. Copy Ask AI < PipecatClientAudio /> Props No props required β PipecatClientVideo Creates a new <video> element that renders either the bot or local participantβs video track. Copy Ask AI < PipecatClientVideo participant = "local" fit = "cover" mirror onResize = { ({ aspectRatio , height , width }) => { console . log ( "Video dimensions changed:" , { aspectRatio , height , width }); } } /> Props β participant ('local' | 'bot') required Defines which participantβs video track is rendered β fit ('contain' | 'cover') Defines whether the video should be fully contained or cover the box. Default: βcontainβ β mirror boolean Forces the video to be mirrored, if set β onResize(dimensions: object) function Triggered whenever the videoβs rendered width or height changes β PipecatClientCamToggle A headless component to read and set the local participantβs camera state. Copy Ask AI < PipecatClientCamToggle onCamEnabledChanged = { ( enabled ) => console . log ( "Camera enabled:" , enabled ) } disabled = { false } > { ({ disabled , isCamEnabled , onClick }) => ( < button disabled = { disabled } onClick = { onClick } > { isCamEnabled ? "Disable Camera" : "Enable Camera" } </ button > ) } </ PipecatClientCamToggle > Props β onCamEnabledChanged(enabled: boolean) function Triggered whenever the local participantβs camera state changes β disabled boolean If true, the component will not allow toggling the camera state. Default: false β children function A render prop that provides state and handlers to the children β PipecatClientMicToggle A headless component to read and set the local participantβs microphone state. Copy Ask AI < PipecatClientMicToggle onMicEnabledChanged = { ( enabled ) => console . log ( "Microphone enabled:" , enabled ) } disabled = { false } > { ({ disabled , isMicEnabled , onClick }) => ( < button disabled = { disabled } onClick = { onClick } > { isMicEnabled ? "Disable Microphone" : "Enable Microphone" } </ button > ) } </ PipecatClientMicToggle > Props β onMicEnabledChanged(enabled: boolean) function Triggered whenever the local participantβs microphone state changes β disabled boolean If true, the component will not allow toggling the microphone state. Default: false β children function A render prop that provides state and handlers to the children β VoiceVisualizer Renders a visual representation of audio input levels on a <canvas> element. Copy Ask AI < VoiceVisualizer participantType = "local" backgroundColor = "white" barColor = "black" barGap = { 1 } barWidth = { 4 } barMaxHeight = { 24 } /> Props β participantType string required The participant type to visualize audio for β backgroundColor string The background color of the canvas. Default: βtransparentβ β barColor string The color of the audio level bars. Default: βblackβ β barCount number The number of bars to display. Default: 5 β barGap number The gap between bars in pixels. Default: 12 β barWidth number The width of each bar in pixels. Default: 30 β barMaxHeight number The maximum height at full volume of each bar in pixels. Default: 120 SDK Introduction Hooks On this page PipecatClientProvider PipecatClientAudio PipecatClientVideo PipecatClientCamToggle PipecatClientMicToggle VoiceVisualizer Assistant Responses are generated using AI and may contain mistakes.
|
react_hooks_54f22d4c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/react/hooks#param-track-type
|
2 |
+
Title: Hooks - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Hooks - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation API Reference Hooks Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference Components Hooks React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides hooks for accessing client functionality, managing media devices, and handling events. β usePipecatClient Provides access to the PipecatClient instance originally passed to PipecatClientProvider . Copy Ask AI import { usePipecatClient } from "@pipecat-ai/client-react" ; function MyComponent () { const pcClient = usePipecatClient (); await pcClient . connect ({ endpoint: 'https://your-pipecat-api-url/connect' , requestData: { // Any custom data your /connect endpoint requires } }); } β useRTVIClientEvent Allows subscribing to RTVI client events. It is advised to wrap handlers with useCallback . Copy Ask AI import { useCallback } from "react" ; import { RTVIEvent , TransportState } from "@pipecat-ai/client-js" ; import { useRTVIClientEvent } from "@pipecat-ai/client-react" ; function EventListener () { useRTVIClientEvent ( RTVIEvent . TransportStateChanged , useCallback (( transportState : TransportState ) => { console . log ( "Transport state changed to" , transportState ); }, []) ); } Arguments β event RTVIEvent required β handler function required β usePipecatClientMediaDevices Manage and list available media devices. Copy Ask AI import { usePipecatClientMediaDevices } from "@pipecat-ai/client-react" ; function DeviceSelector () { const { availableCams , availableMics , selectedCam , selectedMic , updateCam , updateMic , } = usePipecatClientMediaDevices (); return ( <> < select name = "cam" onChange = { ( ev ) => updateCam ( ev . target . value ) } value = { selectedCam ?. deviceId } > { availableCams . map (( cam ) => ( < option key = { cam . deviceId } value = { cam . deviceId } > { cam . label } </ option > )) } </ select > < select name = "mic" onChange = { ( ev ) => updateMic ( ev . target . value ) } value = { selectedMic ?. deviceId } > { availableMics . map (( mic ) => ( < option key = { mic . deviceId } value = { mic . deviceId } > { mic . label } </ option > )) } </ select > </> ); } β usePipecatClientMediaTrack Access audio and video tracks. Copy Ask AI import { usePipecatClientMediaTrack } from "@pipecat-ai/client-react" ; function MyTracks () { const localAudioTrack = usePipecatClientMediaTrack ( "audio" , "local" ); const botAudioTrack = usePipecatClientMediaTrack ( "audio" , "bot" ); } Arguments β trackType 'audio' | 'video' required β participantType 'bot' | 'local' required β usePipecatClientTransportState Returns the current transport state. Copy Ask AI import { usePipecatClientTransportState } from "@pipecat-ai/client-react" ; function ConnectionStatus () { const transportState = usePipecatClientTransportState (); } β usePipecatClientCamControl Controls the local participantβs camera state. Copy Ask AI import { usePipecatClientCamControl } from "@pipecat-ai/client-react" ; function CamToggle () { const { enableCam , isCamEnabled } = usePipecatClientCamControl (); return ( < button onClick = { () => enableCam ( ! isCamEnabled ) } > { isCamEnabled ? "Disable Camera" : "Enable Camera" } </ button > ); } β usePipecatClientMicControl Controls the local participantβs microphone state. Copy Ask AI import { usePipecatClientMicControl } from "@pipecat-ai/client-react" ; function MicToggle () { const { enableMic , isMicEnabled } = usePipecatClientMicControl (); return ( < button onClick = { () => enableMic ( ! isMicEnabled ) } > { isMicEnabled ? "Disable Microphone" : "Enable Microphone" } </ button > ); } Components SDK Introduction On this page usePipecatClient useRTVIClientEvent usePipecatClientMediaDevices usePipecatClientMediaTrack usePipecatClientTransportState usePipecatClientCamControl usePipecatClientMicControl Assistant Responses are generated using AI and may contain mistakes.
|
react_introduction_470e3f47.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/react/introduction#example
|
2 |
+
Title: SDK Introduction - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SDK Introduction - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation React SDK SDK Introduction Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides React-specific components and hooks for building voice and multimodal AI applications. It wraps the core JavaScript SDK functionality in an idiomatic React interface that handles: React context for client state management Components for audio and video rendering Hooks for accessing client functionality Media device management Event handling through hooks β Installation Install the SDK, core client, and a transport implementation (e.g. Daily for WebRTC): Copy Ask AI npm install @pipecat-ai/client-js npm install @pipecat-ai/client-react npm install @pipecat-ai/daily-transport β Example Hereβs a simple example using Daily as the transport layer: Copy Ask AI import { PipecatClient } from "@pipecat-ai/client-js" ; import { PipecatClientProvider , PipecatClientAudio , usePipecatClient , } from "@pipecat-ai/client-react" ; import { DailyTransport } from "@pipecat-ai/daily-transport" ; // Create the client instance const client = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , }); // Root component wraps the app with the provider function App () { return ( < PipecatClientProvider client = { client } > < VoiceBot /> < PipecatClientAudio /> </ PipecatClientProvider > ); } // Component using the client function VoiceBot () { const client = usePipecatClient (); const handleClick = async () => { await client . connect ({ endpoint: ` ${ process . env . PIPECAT_API_URL || "/api" } /connect` }); }; return ( < button onClick = { handleClick } > Start Conversation </ button > ; ); } β Explore the SDK Components Ready-to-use components for audio, video, and visualization Hooks React hooks for accessing client functionality The Pipecat React SDK builds on top of the JavaScript SDK to provide an idiomatic React interface while maintaining compatibility with the RTVI standard. OpenAIRealTimeWebRTCTransport Components On this page Installation Example Explore the SDK Assistant Responses are generated using AI and may contain mistakes.
|
s2s_aws_15f3d046.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/s2s/aws#param-input-sample-size
|
2 |
+
Title: AWS Nova Sonic - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AWS Nova Sonic - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Speech-to-Speech AWS Nova Sonic Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The AWSNovaSonicLLMService enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions β Installation To use AWSNovaSonicLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws-nova-sonic]" We recommend setting up your AWS credentials as environment variables, as youβll need them to initialize AWSNovaSonicLLMService : AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID AWS_REGION β Basic Usage Hereβs a simple example of setting up a conversational AI bot with AWS Nova Sonic: Copy Ask AI from pipecat.services.aws_nova_sonic.aws import AWSNovaSonicLLMService llm = AWSNovaSonicLLMService( secret_access_key = os.getenv( "AWS_SECRET_ACCESS_KEY" ), access_key_id = os.getenv( "AWS_ACCESS_KEY_ID" ), region = os.getenv( "AWS_REGION" ) voice_id = "tiffany" , # Voices: matthew, tiffany, amy ) β Configuration β Constructor Parameters β secret_access_key str required Your AWS secret access key β access_key_id str required Your AWS access key ID β region str required Specify the AWS region for the service (e.g., "us-east-1" ). Note that the service may not be available in all AWS regions: check the AWS Bedrock User Guideβs support table . β model str default: "amazon.nova-sonic-v1:0" AWS Nova Sonic model to use. Note that "amazon.nova-sonic-v1:0" is the only supported model as of 2025-05-08. β voice_id str default: "matthew" Voice for text-to-speech (options: "matthew" , "tiffany" , "amy" ) β params Params Configuration for model parameters β system_instruction str High-level instructions that guide the modelβs behavior. Note that more commonly these instructions will be included as part of the context provided to kick off the conversation. β tools ToolsSchema List of function definitions for tool/function calling. Note that more commonly tools will be included as part of the context provided to kick off the conversation. β send_transcription_frames bool default: "True" Whether to emit transcription frames β Model Parameters The Params object configures the behavior of the AWS Nova Sonic model. It is strongly recommended to stick with default values (most easily by omitting params when constructing AWSNovaSonicLLMService ) unless you have a good understanding of the parameters and their impact. Deviating from the defaults may lead to unexpected behavior. β temperature float default: "0.7" Controls randomness in responses. Range: 0.0 to 2.0 β max_tokens int default: "1024" Maximum number of tokens to generate β top_p float default: "0.9" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 β input_sample_rate int default: "16000" Sample rate for input audio β output_sample_rate int default: "24000" Sample rate for output audio β input_sample_size int default: "16" Bit depth for input audio β input_channel_count int default: "1" Number of channels for input audio β output_sample_size int default: "16" Bit depth for output audio β output_channel_count int default: "1" Number of channels for output audio β Frame Types β Input Frames β InputAudioRawFrame Frame Raw audio data for speech input β OpenAILLMContextFrame Frame Contains conversation context β BotStoppedSpeakingFrame Frame Signals the bot has stopped speaking β Output Frames β TTSAudioRawFrame Frame Generated speech audio β LLMFullResponseStartFrame Frame Signals the start of a response from the LLM β LLMFullResponseEndFrame Frame Signals the end of a response from the LLM β TTSStartedFrame Frame Signals start of speech synthesis (coincides with the start of the LLM response, as this is a speech-to-speech model) β TTSStoppedFrame Frame Signals end of speech synthesis (coincides with the end of the LLM response, as this is a speech-to-speech model) β LLMTextFrame Frame Generated text responses from the LLM β TTSTextFrame Frame Generated text responses β TranscriptionFrame Frame Speech transcriptions. Only output if send_transcription_frames is True . β Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples β Next Steps β Examples Foundational Example Basic implementation showing core features Persistent Content Example Implementation showing saving and loading conversation history XTTS Gemini Multimodal Live On this page Installation Basic Usage Configuration Constructor Parameters Model Parameters Frame Types Input Frames Output Frames Function Calling Next Steps Examples Assistant Responses are generated using AI and may contain mistakes.
|
s2s_gemini_13c30d1c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-max-tokens
|
2 |
+
Title: Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Gemini Multimodal Live - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Googleβs Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . β Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" Youβll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . β Basic Usage Hereβs a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) β Configuration β Constructor Parameters β api_key str required Your Google API key β base_url str API endpoint URL β model str Gemini model to use (upgraded to new v1beta model) β voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) β system_instruction str High-level instructions that guide the modelβs behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) β start_audio_paused bool default: "False" Whether to start with audio input paused β start_video_paused bool default: "False" Whether to start with video input paused β tools Union[List[dict], ToolsSchema] Tools/functions available to the model β inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set β Input Parameters β frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 β max_tokens int default: "4096" Maximum number of tokens to generate β modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). β presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 β temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 β language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. β media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens β vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) β top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 β top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 β context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) β Methods β set_audio_input_paused(paused: bool) method Pause or unpause audio input processing β set_video_input_paused(paused: bool) method Pause or unpause video input processing β set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) β set_language(language: Language) method Change the language for generation β set_context(context: OpenAILLMContext) method Set the conversation context explicitly β create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state β Frame Types β Input Frames β InputAudioRawFrame Frame Raw audio data for speech input β InputImageRawFrame Frame Raw image data for visual input β StartInterruptionFrame Frame Signals start of user interruption β UserStartedSpeakingFrame Frame Signals user started speaking β UserStoppedSpeakingFrame Frame Signals user stopped speaking β OpenAILLMContextFrame Frame Contains conversation context β LLMMessagesAppendFrame Frame Adds messages to the conversation β LLMUpdateSettingsFrame Frame Updates LLM settings β LLMSetToolsFrame Frame Sets available tools for the LLM β Output Frames β TTSAudioRawFrame Frame Generated speech audio β TTSStartedFrame Frame Signals start of speech synthesis β TTSStoppedFrame Frame Signals end of speech synthesis β LLMTextFrame Frame Generated text responses from the LLM β TTSTextFrame Frame Text used for speech synthesis β TranscriptionFrame Frame Speech transcriptions from user audio β LLMFullResponseStartFrame Frame Signals the start of a complete LLM response β LLMFullResponseEndFrame Frame Signals the end of a complete LLM response β Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples β Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. β Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) β Next Steps β Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. β Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|
s2s_gemini_3e665166.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-llm-set-tools-frame
|
2 |
+
Title: Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Gemini Multimodal Live - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Googleβs Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . β Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" Youβll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . β Basic Usage Hereβs a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) β Configuration β Constructor Parameters β api_key str required Your Google API key β base_url str API endpoint URL β model str Gemini model to use (upgraded to new v1beta model) β voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) β system_instruction str High-level instructions that guide the modelβs behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) β start_audio_paused bool default: "False" Whether to start with audio input paused β start_video_paused bool default: "False" Whether to start with video input paused β tools Union[List[dict], ToolsSchema] Tools/functions available to the model β inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set β Input Parameters β frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 β max_tokens int default: "4096" Maximum number of tokens to generate β modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). β presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 β temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 β language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. β media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens β vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) β top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 β top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 β context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) β Methods β set_audio_input_paused(paused: bool) method Pause or unpause audio input processing β set_video_input_paused(paused: bool) method Pause or unpause video input processing β set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) β set_language(language: Language) method Change the language for generation β set_context(context: OpenAILLMContext) method Set the conversation context explicitly β create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state β Frame Types β Input Frames β InputAudioRawFrame Frame Raw audio data for speech input β InputImageRawFrame Frame Raw image data for visual input β StartInterruptionFrame Frame Signals start of user interruption β UserStartedSpeakingFrame Frame Signals user started speaking β UserStoppedSpeakingFrame Frame Signals user stopped speaking β OpenAILLMContextFrame Frame Contains conversation context β LLMMessagesAppendFrame Frame Adds messages to the conversation β LLMUpdateSettingsFrame Frame Updates LLM settings β LLMSetToolsFrame Frame Sets available tools for the LLM β Output Frames β TTSAudioRawFrame Frame Generated speech audio β TTSStartedFrame Frame Signals start of speech synthesis β TTSStoppedFrame Frame Signals end of speech synthesis β LLMTextFrame Frame Generated text responses from the LLM β TTSTextFrame Frame Text used for speech synthesis β TranscriptionFrame Frame Speech transcriptions from user audio β LLMFullResponseStartFrame Frame Signals the start of a complete LLM response β LLMFullResponseEndFrame Frame Signals the end of a complete LLM response β Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples β Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. β Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) β Next Steps β Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. β Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|
s2s_gemini_4924f9a5.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/s2s/gemini#examples
|
2 |
+
Title: Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Gemini Multimodal Live - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Googleβs Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . β Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" Youβll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . β Basic Usage Hereβs a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) β Configuration β Constructor Parameters β api_key str required Your Google API key β base_url str API endpoint URL β model str Gemini model to use (upgraded to new v1beta model) β voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) β system_instruction str High-level instructions that guide the modelβs behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) β start_audio_paused bool default: "False" Whether to start with audio input paused β start_video_paused bool default: "False" Whether to start with video input paused β tools Union[List[dict], ToolsSchema] Tools/functions available to the model β inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set β Input Parameters β frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 β max_tokens int default: "4096" Maximum number of tokens to generate β modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). β presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 β temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 β language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. β media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens β vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) β top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 β top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 β context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) β Methods β set_audio_input_paused(paused: bool) method Pause or unpause audio input processing β set_video_input_paused(paused: bool) method Pause or unpause video input processing β set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) β set_language(language: Language) method Change the language for generation β set_context(context: OpenAILLMContext) method Set the conversation context explicitly β create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state β Frame Types β Input Frames β InputAudioRawFrame Frame Raw audio data for speech input β InputImageRawFrame Frame Raw image data for visual input β StartInterruptionFrame Frame Signals start of user interruption β UserStartedSpeakingFrame Frame Signals user started speaking β UserStoppedSpeakingFrame Frame Signals user stopped speaking β OpenAILLMContextFrame Frame Contains conversation context β LLMMessagesAppendFrame Frame Adds messages to the conversation β LLMUpdateSettingsFrame Frame Updates LLM settings β LLMSetToolsFrame Frame Sets available tools for the LLM β Output Frames β TTSAudioRawFrame Frame Generated speech audio β TTSStartedFrame Frame Signals start of speech synthesis β TTSStoppedFrame Frame Signals end of speech synthesis β LLMTextFrame Frame Generated text responses from the LLM β TTSTextFrame Frame Text used for speech synthesis β TranscriptionFrame Frame Speech transcriptions from user audio β LLMFullResponseStartFrame Frame Signals the start of a complete LLM response β LLMFullResponseEndFrame Frame Signals the end of a complete LLM response β Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples β Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. β Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) β Next Steps β Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. β Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|
s2s_gemini_98435b26.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-tts-text-frame
|
2 |
+
Title: Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Gemini Multimodal Live - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Googleβs Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . β Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" Youβll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . β Basic Usage Hereβs a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) β Configuration β Constructor Parameters β api_key str required Your Google API key β base_url str API endpoint URL β model str Gemini model to use (upgraded to new v1beta model) β voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) β system_instruction str High-level instructions that guide the modelβs behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) β start_audio_paused bool default: "False" Whether to start with audio input paused β start_video_paused bool default: "False" Whether to start with video input paused β tools Union[List[dict], ToolsSchema] Tools/functions available to the model β inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set β Input Parameters β frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 β max_tokens int default: "4096" Maximum number of tokens to generate β modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). β presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 β temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 β language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. β media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens β vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) β top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 β top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 β context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) β Methods β set_audio_input_paused(paused: bool) method Pause or unpause audio input processing β set_video_input_paused(paused: bool) method Pause or unpause video input processing β set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) β set_language(language: Language) method Change the language for generation β set_context(context: OpenAILLMContext) method Set the conversation context explicitly β create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state β Frame Types β Input Frames β InputAudioRawFrame Frame Raw audio data for speech input β InputImageRawFrame Frame Raw image data for visual input β StartInterruptionFrame Frame Signals start of user interruption β UserStartedSpeakingFrame Frame Signals user started speaking β UserStoppedSpeakingFrame Frame Signals user stopped speaking β OpenAILLMContextFrame Frame Contains conversation context β LLMMessagesAppendFrame Frame Adds messages to the conversation β LLMUpdateSettingsFrame Frame Updates LLM settings β LLMSetToolsFrame Frame Sets available tools for the LLM β Output Frames β TTSAudioRawFrame Frame Generated speech audio β TTSStartedFrame Frame Signals start of speech synthesis β TTSStoppedFrame Frame Signals end of speech synthesis β LLMTextFrame Frame Generated text responses from the LLM β TTSTextFrame Frame Text used for speech synthesis β TranscriptionFrame Frame Speech transcriptions from user audio β LLMFullResponseStartFrame Frame Signals the start of a complete LLM response β LLMFullResponseEndFrame Frame Signals the end of a complete LLM response β Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples β Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. β Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) β Next Steps β Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. β Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|