Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- analytics_sentry_cf02a907.txt +5 -0
- android_api-reference_d0f255ab.txt +5 -0
- android_introduction_7ffdd137.txt +5 -0
- audio_audio-buffer-processor_3a034d3c.txt +5 -0
- audio_krisp-filter_06d586db.txt +5 -0
- audio_krisp-filter_78ecbba3.txt +5 -0
- audio_silero-vad-analyzer_beb54155.txt +5 -0
- audio_silero-vad-analyzer_c604e722.txt +5 -0
- base-classes_media_68c24817.txt +5 -0
- base-classes_speech_3040381e.txt +5 -0
- c_transport_ae6d2316.txt +5 -0
- client_rtvi-standard_0f269efd.txt +5 -0
- client_rtvi-standard_10095bd8.txt +5 -0
- client_rtvi-standard_a3cec5f6.txt +5 -0
- client_rtvi-standard_de003d42.txt +5 -0
- client_rtvi-standard_f54a4952.txt +5 -0
- daily_rest-helpers_4cde2775.txt +5 -0
- daily_rest-helpers_a9e4a966.txt +5 -0
- daily_rest-helpers_c4920582.txt +5 -0
- daily_rest-helpers_e4cf41d3.txt +5 -0
- daily_rest-helpers_f6a78696.txt +5 -0
- deployment_cerebrium_b676596a.txt +5 -0
- deployment_fly_b244f040.txt +5 -0
- deployment_pattern_cabfd79c.txt +5 -0
- deployment_pattern_e013764a.txt +5 -0
- deployment_pipecat-cloud_9cbcfcad.txt +5 -0
- deployment_pipecat-cloud_a13c351c.txt +5 -0
- deployment_wwwflyio_fcbf6f91.txt +5 -0
- features_gemini-multimodal-live_0d50e7fb.txt +5 -0
- features_gemini-multimodal-live_37e7755d.txt +5 -0
- features_krisp_340dc07e.txt +5 -0
- features_pipecat-flows_07c1d855.txt +5 -0
- features_pipecat-flows_10d38465.txt +5 -0
- features_pipecat-flows_c7ec073f.txt +5 -0
- features_pipecat-flows_e0431dc6.txt +5 -0
- filters_identify-filter_8203164a.txt +5 -0
- filters_stt-mute_a3e0a3c4.txt +5 -0
- filters_stt-mute_e3c145a9.txt +5 -0
- flows_pipecat-flows_2ded68ca.txt +5 -0
- flows_pipecat-flows_3679056b.txt +5 -0
- flows_pipecat-flows_ac3a5ae7.txt +5 -0
- flows_pipecat-flows_b3264db4.txt +5 -0
- frame_producer-consumer_293a2488.txt +5 -0
- frame_producer-consumer_8615da3b.txt +5 -0
- fundamentals_context-management_3438fe83.txt +5 -0
- fundamentals_custom-frame-processor_4a4eb987.txt +5 -0
- fundamentals_detecting-user-idle_3b1dfd0f.txt +5 -0
- fundamentals_end-pipeline_2929a644.txt +5 -0
- fundamentals_function-calling_053f634a.txt +5 -0
- fundamentals_function-calling_1b56ceae.txt +5 -0
analytics_sentry_cf02a907.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/analytics/sentry#configuration
|
2 |
+
Title: Sentry Metrics - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Sentry Metrics - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Analytics & Monitoring Sentry Metrics Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Sentry Metrics Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SentryMetrics extends FrameProcessorMetrics to provide performance monitoring integration with Sentry. It tracks Time to First Byte (TTFB) and processing duration metrics for frame processors. Installation To use Sentry metrics, install the Sentry SDK: Copy Ask AI pip install "pipecat-ai[sentry]" Configuration Sentry must be initialized in your application before metrics will be collected: Copy Ask AI import sentry_sdk sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) Usage Example Copy Ask AI import sentry_sdk from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.metrics.sentry import SentryMetrics from pipecat.transports.services.daily import DailyParams, DailyTransport async def create_metrics_pipeline (): sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_out_enabled = True , audio_in_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ), ) tts = ElevenLabsTTSService( api_key = os.getenv( "ELEVENLABS_API_KEY" ), metrics = SentryMetrics(), ) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" ), model = "gpt-4o" ), metrics = SentryMetrics(), ) messages = [ { "role" : "system" , "content" : "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself. Keep all your responses to 12 words or fewer." , }, ] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) Transaction Information Each transaction includes: Operation type ( ttfb or processing ) Description with processor name Start timestamp End timestamp Unique transaction ID Fallback Behavior If Sentry is not available (not installed or not initialized): Warning logs are generated Metric methods execute without error No data is sent to Sentry Notes Requires Sentry SDK to be installed and initialized Thread-safe metric collection Automatic transaction management Supports selective TTFB reporting Integrates with Sentry’s performance monitoring Provides detailed timing information Maintains timing data even when Sentry is unavailable Moondream Producer & Consumer Processors On this page Overview Installation Configuration Usage Example Transaction Information Fallback Behavior Notes Assistant Responses are generated using AI and may contain mistakes.
|
android_api-reference_d0f255ab.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/android/api-reference#content
|
2 |
+
Title: All modules
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
All modules All modules: pipecat-client-android Link copied to clipboard pipecat-transport-daily Link copied to clipboard pipecat-transport-gemini-live-websocket Link copied to clipboard pipecat-transport-openai-realtime-webrtc Link copied to clipboard © 2025 Copyright Generated by dokka
|
android_introduction_7ffdd137.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/android/introduction#installation
|
2 |
+
Title: SDK Introduction - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SDK Introduction - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Android SDK SDK Introduction Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat Android SDK provides a Kotlin implementation for building voice and multimodal AI applications on Android. It handles: Real-time audio and video streaming Bot communication and state management Media device handling Configuration management Event handling Installation Add the dependency for your chosen transport to your build.gradle file. For example, to use the Daily transport: Copy Ask AI implementation "ai.pipecat:daily-transport:0.3.3" Example Here’s a simple example using Daily as the transport layer. Note that the clientConfig is optional and depends on what is required by the bot backend. Copy Ask AI val clientConfig = listOf ( ServiceConfig ( service = "llm" , options = listOf ( Option ( "model" , "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo" ), Option ( "messages" , Value. Array ( Value. Object ( "role" to Value. Str ( "system" ), "content" to Value. Str ( "You are a helpful assistant." ) ) )) ) ), ServiceConfig ( service = "tts" , options = listOf ( Option ( "voice" , "79a125e8-cd45-4c13-8a67-188112f4dd22" ) ) ) ) val callbacks = object : RTVIEventCallbacks () { override fun onBackendError (message: String ) { Log. e (TAG, "Error from backend: $message " ) } } val options = RTVIClientOptions ( services = listOf ( ServiceRegistration ( "llm" , "together" ), ServiceRegistration ( "tts" , "cartesia" )), params = RTVIClientParams (baseUrl = "<your API url>" , config = clientConfig) ) val client = RTVIClient (DailyTransport. Factory (context), callbacks, options) client. connect (). await () // Using Coroutines // Or using callbacks: // client.start().withCallback { /* handle completion */ } Documentation API Reference Complete SDK API documentation Daily Transport WebRTC implementation using Daily OpenAIRealTimeWebRTCTransport API Reference On this page Installation Example Documentation Assistant Responses are generated using AI and may contain mistakes.
|
audio_audio-buffer-processor_3a034d3c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#constructor
|
2 |
+
Title: AudioBufferProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AudioBufferProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) Parameters sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transport’s sample rate from the StartFrame . num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). Properties sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). Methods start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the user’s speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the bot’s speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True Integration Notes STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
|
audio_krisp-filter_06d586db.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/krisp-filter#param-model-path
|
2 |
+
Title: KrispFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
KrispFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing KrispFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview KrispFilter is an audio processor that reduces background noise in real-time audio streams using Krisp AI technology. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Krisp, you need a Krisp SDK license. Get started at Krisp.ai . Looking for help getting started with Krisp and Pipecat? Checkout our Krisp noise cancellation guide . Installation The Krisp filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[krisp]" Environment Variables You need to provide the path to the Krisp model. This can either be done by setting the KRISP_MODEL_PATH environment variable or by setting the model_path in the constructor. Constructor Parameters sample_type str default: "PCM_16" Audio sample type format channels int default: "1" Number of audio channels model_path str default: "None" Path to the Krisp model file. You can set the model_path directly. Alternatively, you can set the KRISP_MODEL_PATH environment variable to the model file path. Input Frames FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) Usage Example Copy Ask AI from pipecat.audio.filters.krisp_filter import KrispFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KrispFilter(), # Enable Krisp noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) Audio Flow Notes Requires Krisp SDK and model file to be available Supports real-time audio processing Supports additional features like background voice removal Handles PCM_16 audio format Thread-safe for pipeline processing Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency KoalaFilter NoisereduceFilter On this page Overview Installation Environment Variables Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_krisp-filter_78ecbba3.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/krisp-filter#audio-flow
|
2 |
+
Title: KrispFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
KrispFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing KrispFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview KrispFilter is an audio processor that reduces background noise in real-time audio streams using Krisp AI technology. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Krisp, you need a Krisp SDK license. Get started at Krisp.ai . Looking for help getting started with Krisp and Pipecat? Checkout our Krisp noise cancellation guide . Installation The Krisp filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[krisp]" Environment Variables You need to provide the path to the Krisp model. This can either be done by setting the KRISP_MODEL_PATH environment variable or by setting the model_path in the constructor. Constructor Parameters sample_type str default: "PCM_16" Audio sample type format channels int default: "1" Number of audio channels model_path str default: "None" Path to the Krisp model file. You can set the model_path directly. Alternatively, you can set the KRISP_MODEL_PATH environment variable to the model file path. Input Frames FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) Usage Example Copy Ask AI from pipecat.audio.filters.krisp_filter import KrispFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KrispFilter(), # Enable Krisp noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) Audio Flow Notes Requires Krisp SDK and model file to be available Supports real-time audio processing Supports additional features like background voice removal Handles PCM_16 audio format Thread-safe for pipeline processing Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency KoalaFilter NoisereduceFilter On this page Overview Installation Environment Variables Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_silero-vad-analyzer_beb54155.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer
|
2 |
+
Title: SileroVADAnalyzer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SileroVADAnalyzer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing SileroVADAnalyzer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SileroVADAnalyzer is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime. Installation The Silero VAD analyzer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[silero]" Constructor Parameters sample_rate int default: "None" Audio sample rate in Hz. Must be either 8000 or 16000. params VADParams default: "VADParams()" Voice Activity Detection parameters object Show properties confidence float default: "0.7" Confidence threshold for speech detection. Higher values make detection more strict. Must be between 0 and 1. start_secs float default: "0.2" Time in seconds that speech must be detected before transitioning to SPEAKING state. stop_secs float default: "0.8" Time in seconds of silence required before transitioning back to QUIET state. min_volume float default: "0.6" Minimum audio volume threshold for speech detection. Must be between 0 and 1. Usage Example Copy Ask AI transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) Technical Details Sample Rate Requirements The analyzer supports two sample rates: 8000 Hz (256 samples per frame) 16000 Hz (512 samples per frame) Model Management Uses ONNX runtime for efficient inference Automatically resets model state every 5 seconds to manage memory Runs on CPU by default for consistent performance Includes built-in model file Notes High-accuracy speech detection Efficient ONNX-based processing Automatic memory management Thread-safe for pipeline processing Built-in model file included CPU-optimized inference Supports 8kHz and 16kHz audio NoisereduceFilter SoundfileMixer On this page Overview Installation Constructor Parameters Usage Example Technical Details Sample Rate Requirements Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_silero-vad-analyzer_c604e722.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer#notes
|
2 |
+
Title: SileroVADAnalyzer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SileroVADAnalyzer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing SileroVADAnalyzer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview SileroVADAnalyzer is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime. Installation The Silero VAD analyzer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[silero]" Constructor Parameters sample_rate int default: "None" Audio sample rate in Hz. Must be either 8000 or 16000. params VADParams default: "VADParams()" Voice Activity Detection parameters object Show properties confidence float default: "0.7" Confidence threshold for speech detection. Higher values make detection more strict. Must be between 0 and 1. start_secs float default: "0.2" Time in seconds that speech must be detected before transitioning to SPEAKING state. stop_secs float default: "0.8" Time in seconds of silence required before transitioning back to QUIET state. min_volume float default: "0.6" Minimum audio volume threshold for speech detection. Must be between 0 and 1. Usage Example Copy Ask AI transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) Technical Details Sample Rate Requirements The analyzer supports two sample rates: 8000 Hz (256 samples per frame) 16000 Hz (512 samples per frame) Model Management Uses ONNX runtime for efficient inference Automatically resets model state every 5 seconds to manage memory Runs on CPU by default for consistent performance Includes built-in model file Notes High-accuracy speech detection Efficient ONNX-based processing Automatic memory management Thread-safe for pipeline processing Built-in model file included CPU-optimized inference Supports 8kHz and 16kHz audio NoisereduceFilter SoundfileMixer On this page Overview Installation Constructor Parameters Usage Example Technical Details Sample Rate Requirements Notes Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_media_68c24817.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/media#real-time-processing
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_speech_3040381e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/speech#next-steps
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
c_transport_ae6d2316.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/c++/transport#dependencies
|
2 |
+
Title: Daily WebRTC Transport - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily WebRTC Transport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation C++ SDK Daily WebRTC Transport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Daily transport implementation enables real-time audio and video communication in your Pipecat C++ applications using Daily’s WebRTC infrastructure. Dependencies Daily Core C++ SDK Download the Daily Core C++ SDK from the available releases for your platform and set: Copy Ask AI export DAILY_CORE_PATH = / path / to / daily-core-sdk Pipecat C++ SDK Build the base Pipecat C++ SDK first and set: Copy Ask AI export PIPECAT_SDK_PATH = / path / to / pipecat-client-cxx Building First, set a few environment variables: Copy Ask AI PIPECAT_SDK_PATH = /path/to/pipecat-client-cxx DAILY_CORE_PATH = /path/to/daily-core-sdk Then, build the project: Linux/macOS Windows Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI # Initialize Visual Studio environment "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Auxiliary\Build\vcvarsall.bat" amd64 # Configure and build cmake . -Bbuild --preset vcpkg cmake --build build --config Release Examples Basic Client Simple C++ implementation example Audio Client C++ client with PortAudio support Node.js Server Example Node.js proxy implementation SDK Introduction On this page Dependencies Daily Core C++ SDK Pipecat C++ SDK Building Examples Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_0f269efd.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#metrics-and-monitoring
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_10095bd8.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#user-stopped-speaking-%F0%9F%A4%96
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_a3cec5f6.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#key-features
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_de003d42.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#bot-stopped-speaking-%F0%9F%A4%96
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_f54a4952.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#param-platform-version
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } id string A unique identifier for the message, used to correlate requests and responses. label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. data unknown The payload of the message, which can be any data structure relevant to the message type. RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). Connection Management client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient library string required library_version string platform string platform_version string platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. Transcription user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. Client-Server Messaging server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string Advanced LLM Interactions append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } Service-Specific Insights bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. Metrics and Monitoring metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_4cde2775.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-num-endpoints
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details Classes DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. display_name string default: "sw-sip-dialin" Display name for the SIP endpoint video boolean default: false Whether video is enabled for SIP sip_mode string default: "dial-in" SIP connection mode num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. bucket_name string required Name of the S3 bucket for storing recordings bucket_region string required AWS region where the S3 bucket is located assume_role_arn string required ARN of the IAM role to assume for S3 access allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) DailyRoomProperties Properties that configure a Daily room’s behavior and features. exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) enable_chat boolean default: false Whether chat is enabled in the room enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled enable_emoji_reactions boolean default: false Whether emoji reactions are enabled eject_at_room_exp boolean default: false Whether to eject participants when room expires enable_dialout boolean Whether dial-out is enabled enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) geo string Geographic region for room max_participants number Maximum number of participants allowed in the room recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings sip DailyRoomSipParams SIP configuration parameters sip_uri dict SIP URI configuration (returned by Daily) start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) DailyRoomParams Parameters for creating a new Daily room. name string Room name (if not provided, one will be generated) privacy string default: "public" Room privacy setting (“private” or “public”) properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) DailyRoomObject Response object representing a Daily room. id string Unique room identifier name string Room name api_created boolean Whether the room was created via API privacy string Room privacy setting url string Complete room URL created_at string Room creation timestamp in ISO 8601 format config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) DailyMeetingTokenProperties Properties for configuring a Daily meeting token. room_name string The room this token is valid for. If not set, token is valid for all rooms. eject_at_token_exp boolean Whether to eject user when token expires eject_after_elapsed integer Eject user after this many seconds nbf integer “Not before” timestamp - users cannot join before this time exp integer Expiration timestamp - users cannot join after this time is_owner boolean Whether token grants owner privileges user_name string User’s display name in the meeting user_id string Unique identifier for the user (36 char limit) enable_screenshare boolean Whether user can share their screen start_video_off boolean Whether to join with video off start_audio_off boolean Whether to join with audio off enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) enable_prejoin_ui boolean Whether to show prejoin UI start_cloud_recording boolean Whether to start cloud recording when user joins permissions dict Initial default permissions for a non-meeting-owner participant DailyMeetingTokenParams Parameters for creating a Daily meeting token. properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) Initialize DailyRESTHelper Create a new instance of the Daily REST helper. daily_api_key string required Your Daily API key daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) Create Room Creates a new Daily room with specified parameters. params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) Get Room From URL Retrieves room information using a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) Get Token Generates a meeting token for a specific room. room_url string required The complete Daily room URL expiry_time float default: "3600" Token expiration time in seconds eject_at_token_exp bool default: "False" Whether to eject user when token expires owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) Delete Room By URL Deletes a room using its URL. room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) Delete Room By Name Deletes a room using its name. room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) Get Name From URL Extracts the room name from a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_a9e4a966.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-room-url-1
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details Classes DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. display_name string default: "sw-sip-dialin" Display name for the SIP endpoint video boolean default: false Whether video is enabled for SIP sip_mode string default: "dial-in" SIP connection mode num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. bucket_name string required Name of the S3 bucket for storing recordings bucket_region string required AWS region where the S3 bucket is located assume_role_arn string required ARN of the IAM role to assume for S3 access allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) DailyRoomProperties Properties that configure a Daily room’s behavior and features. exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) enable_chat boolean default: false Whether chat is enabled in the room enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled enable_emoji_reactions boolean default: false Whether emoji reactions are enabled eject_at_room_exp boolean default: false Whether to eject participants when room expires enable_dialout boolean Whether dial-out is enabled enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) geo string Geographic region for room max_participants number Maximum number of participants allowed in the room recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings sip DailyRoomSipParams SIP configuration parameters sip_uri dict SIP URI configuration (returned by Daily) start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) DailyRoomParams Parameters for creating a new Daily room. name string Room name (if not provided, one will be generated) privacy string default: "public" Room privacy setting (“private” or “public”) properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) DailyRoomObject Response object representing a Daily room. id string Unique room identifier name string Room name api_created boolean Whether the room was created via API privacy string Room privacy setting url string Complete room URL created_at string Room creation timestamp in ISO 8601 format config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) DailyMeetingTokenProperties Properties for configuring a Daily meeting token. room_name string The room this token is valid for. If not set, token is valid for all rooms. eject_at_token_exp boolean Whether to eject user when token expires eject_after_elapsed integer Eject user after this many seconds nbf integer “Not before” timestamp - users cannot join before this time exp integer Expiration timestamp - users cannot join after this time is_owner boolean Whether token grants owner privileges user_name string User’s display name in the meeting user_id string Unique identifier for the user (36 char limit) enable_screenshare boolean Whether user can share their screen start_video_off boolean Whether to join with video off start_audio_off boolean Whether to join with audio off enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) enable_prejoin_ui boolean Whether to show prejoin UI start_cloud_recording boolean Whether to start cloud recording when user joins permissions dict Initial default permissions for a non-meeting-owner participant DailyMeetingTokenParams Parameters for creating a Daily meeting token. properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) Initialize DailyRESTHelper Create a new instance of the Daily REST helper. daily_api_key string required Your Daily API key daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) Create Room Creates a new Daily room with specified parameters. params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) Get Room From URL Retrieves room information using a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) Get Token Generates a meeting token for a specific room. room_url string required The complete Daily room URL expiry_time float default: "3600" Token expiration time in seconds eject_at_token_exp bool default: "False" Whether to eject user when token expires owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) Delete Room By URL Deletes a room using its URL. room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) Delete Room By Name Deletes a room using its name. room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) Get Name From URL Extracts the room name from a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_c4920582.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-enable-recording
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details Classes DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. display_name string default: "sw-sip-dialin" Display name for the SIP endpoint video boolean default: false Whether video is enabled for SIP sip_mode string default: "dial-in" SIP connection mode num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. bucket_name string required Name of the S3 bucket for storing recordings bucket_region string required AWS region where the S3 bucket is located assume_role_arn string required ARN of the IAM role to assume for S3 access allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) DailyRoomProperties Properties that configure a Daily room’s behavior and features. exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) enable_chat boolean default: false Whether chat is enabled in the room enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled enable_emoji_reactions boolean default: false Whether emoji reactions are enabled eject_at_room_exp boolean default: false Whether to eject participants when room expires enable_dialout boolean Whether dial-out is enabled enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) geo string Geographic region for room max_participants number Maximum number of participants allowed in the room recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings sip DailyRoomSipParams SIP configuration parameters sip_uri dict SIP URI configuration (returned by Daily) start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) DailyRoomParams Parameters for creating a new Daily room. name string Room name (if not provided, one will be generated) privacy string default: "public" Room privacy setting (“private” or “public”) properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) DailyRoomObject Response object representing a Daily room. id string Unique room identifier name string Room name api_created boolean Whether the room was created via API privacy string Room privacy setting url string Complete room URL ��� created_at string Room creation timestamp in ISO 8601 format config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) DailyMeetingTokenProperties Properties for configuring a Daily meeting token. room_name string The room this token is valid for. If not set, token is valid for all rooms. eject_at_token_exp boolean Whether to eject user when token expires eject_after_elapsed integer Eject user after this many seconds nbf integer “Not before” timestamp - users cannot join before this time exp integer Expiration timestamp - users cannot join after this time is_owner boolean Whether token grants owner privileges user_name string User’s display name in the meeting user_id string Unique identifier for the user (36 char limit) enable_screenshare boolean Whether user can share their screen start_video_off boolean Whether to join with video off start_audio_off boolean Whether to join with audio off enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) enable_prejoin_ui boolean Whether to show prejoin UI start_cloud_recording boolean Whether to start cloud recording when user joins permissions dict Initial default permissions for a non-meeting-owner participant DailyMeetingTokenParams Parameters for creating a Daily meeting token. properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) Initialize DailyRESTHelper Create a new instance of the Daily REST helper. daily_api_key string required Your Daily API key daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) Create Room Creates a new Daily room with specified parameters. params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) Get Room From URL Retrieves room information using a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) Get Token Generates a meeting token for a specific room. room_url string required The complete Daily room URL expiry_time float default: "3600" Token expiration time in seconds eject_at_token_exp bool default: "False" Whether to eject user when token expires owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) Delete Room By URL Deletes a room using its URL. room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) Delete Room By Name Deletes a room using its name. room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) Get Name From URL Extracts the room name from a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_e4cf41d3.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-enable-emoji-reactions
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details Classes DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. display_name string default: "sw-sip-dialin" Display name for the SIP endpoint video boolean default: false Whether video is enabled for SIP sip_mode string default: "dial-in" SIP connection mode num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. bucket_name string required Name of the S3 bucket for storing recordings bucket_region string required AWS region where the S3 bucket is located assume_role_arn string required ARN of the IAM role to assume for S3 access allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) DailyRoomProperties Properties that configure a Daily room’s behavior and features. exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) enable_chat boolean default: false Whether chat is enabled in the room enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled enable_emoji_reactions boolean default: false Whether emoji reactions are enabled eject_at_room_exp boolean default: false Whether to eject participants when room expires enable_dialout boolean Whether dial-out is enabled enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) geo string Geographic region for room max_participants number Maximum number of participants allowed in the room recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings sip DailyRoomSipParams SIP configuration parameters sip_uri dict SIP URI configuration (returned by Daily) start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) DailyRoomParams Parameters for creating a new Daily room. name string Room name (if not provided, one will be generated) privacy string default: "public" Room privacy setting (“private” or “public”) properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) DailyRoomObject Response object representing a Daily room. id string Unique room identifier name string Room name api_created boolean Whether the room was created via API privacy string Room privacy setting url string Complete room URL created_at string Room creation timestamp in ISO 8601 format config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) DailyMeetingTokenProperties Properties for configuring a Daily meeting token. room_name string The room this token is valid for. If not set, token is valid for all rooms. eject_at_token_exp boolean Whether to eject user when token expires eject_after_elapsed integer Eject user after this many seconds nbf integer “Not before” timestamp - users cannot join before this time exp integer Expiration timestamp - users cannot join after this time is_owner boolean Whether token grants owner privileges user_name string User’s display name in the meeting user_id string Unique identifier for the user (36 char limit) enable_screenshare boolean Whether user can share their screen start_video_off boolean Whether to join with video off start_audio_off boolean Whether to join with audio off enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) enable_prejoin_ui boolean Whether to show prejoin UI start_cloud_recording boolean Whether to start cloud recording when user joins permissions dict Initial default permissions for a non-meeting-owner participant DailyMeetingTokenParams Parameters for creating a Daily meeting token. properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) Initialize DailyRESTHelper Create a new instance of the Daily REST helper. daily_api_key string required Your Daily API key daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) Create Room Creates a new Daily room with specified parameters. params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) Get Room From URL Retrieves room information using a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) Get Token Generates a meeting token for a specific room. room_url string required The complete Daily room URL expiry_time float default: "3600" Token expiration time in seconds eject_at_token_exp bool default: "False" Whether to eject user when token expires owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) Delete Room By URL Deletes a room using its URL. room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) Delete Room By Name Deletes a room using its name. room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) Get Name From URL Extracts the room name from a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_f6a78696.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#delete-room-by-url
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details Classes DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. display_name string default: "sw-sip-dialin" Display name for the SIP endpoint video boolean default: false Whether video is enabled for SIP sip_mode string default: "dial-in" SIP connection mode num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. bucket_name string required Name of the S3 bucket for storing recordings bucket_region string required AWS region where the S3 bucket is located assume_role_arn string required ARN of the IAM role to assume for S3 access allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) DailyRoomProperties Properties that configure a Daily room’s behavior and features. exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) enable_chat boolean default: false Whether chat is enabled in the room enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled enable_emoji_reactions boolean default: false Whether emoji reactions are enabled eject_at_room_exp boolean default: false Whether to eject participants when room expires enable_dialout boolean Whether dial-out is enabled enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) geo string Geographic region for room max_participants number Maximum number of participants allowed in the room recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings sip DailyRoomSipParams SIP configuration parameters sip_uri dict SIP URI configuration (returned by Daily) start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) DailyRoomParams Parameters for creating a new Daily room. name string Room name (if not provided, one will be generated) privacy string default: "public" Room privacy setting (“private” or “public”) properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) DailyRoomObject Response object representing a Daily room. id string Unique room identifier name string Room name api_created boolean Whether the room was created via API privacy string Room privacy setting url string Complete room URL created_at string Room creation timestamp in ISO 8601 format config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) DailyMeetingTokenProperties Properties for configuring a Daily meeting token. room_name string The room this token is valid for. If not set, token is valid for all rooms. eject_at_token_exp boolean Whether to eject user when token expires eject_after_elapsed integer Eject user after this many seconds nbf integer “Not before” timestamp - users cannot join before this time exp integer Expiration timestamp - users cannot join after this time is_owner boolean Whether token grants owner privileges user_name string User’s display name in the meeting user_id string Unique identifier for the user (36 char limit) enable_screenshare boolean Whether user can share their screen start_video_off boolean Whether to join with video off start_audio_off boolean Whether to join with audio off enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) enable_prejoin_ui boolean Whether to show prejoin UI start_cloud_recording boolean Whether to start cloud recording when user joins permissions dict Initial default permissions for a non-meeting-owner participant DailyMeetingTokenParams Parameters for creating a Daily meeting token. properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) Initialize DailyRESTHelper Create a new instance of the Daily REST helper. daily_api_key string required Your Daily API key daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) Create Room Creates a new Daily room with specified parameters. params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) Get Room From URL Retrieves room information using a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) Get Token Generates a meeting token for a specific room. room_url string required The complete Daily room URL expiry_time float default: "3600" Token expiration time in seconds eject_at_token_exp bool default: "False" Whether to eject user when token expires owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) Delete Room By URL Deletes a room using its URL. room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) Delete Room By Name Deletes a room using its name. room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) Get Name From URL Extracts the room name from a Daily room URL. room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
deployment_cerebrium_b676596a.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/deployment/cerebrium#how-it-works
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
deployment_fly_b244f040.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/deployment/fly
|
2 |
+
Title: Example: Fly.io - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Example: Fly.io - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Example: Fly.io Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Project setup Let’s explore how we can use fly.io to make our app scalable for production by spawning our Pipecat bots on virtual machines with their own resources. We mentioned before that you would ideally containerize the bot_runner.py web service and the bot.py separately. To keep this example simple, we’ll use the same container image for both services. Install the Fly CLI You can find instructions for creating and setting up your fly account here . Creating the Pipecat project We have created a template project here which you can clone. Since we’re targeting production use-cases, this example uses Daily (WebRTC) as a transport, but you can configure your bot however you like. Adding a fly.toml Add a fly.toml to the root of your project directory. Here is a basic example: fly.toml Copy Ask AI app = 'some-unique-app-name' primary_region = 'sjc' [ build ] [ env ] FLY_APP_NAME = 'some-unique-app-name' [ http_service ] internal_port = 7860 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 0 processes = [ 'app' ] [[ vm ]] memory = 512 cpu_kind = 'shared' cpus = 1 For apps with lots of users, consider what resources your HTTP service will require to meet load. We’ll define our bot.py resources later, so you can set and scale these as you like ( fly scale ... ) Environment setup Our bot requires some API keys and configuration, so create a .env in your project root: .env Copy Ask AI DAILY_API_KEY = OPENAI_API_KEY = ELEVENLABS_API_KEY = ELEVENLABS_VOICE_ID = FLY_API_KEY = FLY_APP_NAME = Of course, the exact keys you need will depend on which services you are using within your bot.py . Important: your FLY_APP_NAME should match the name of your fly instance, such as that declared in your fly.toml. The .env will allow us to test in local development, but is not included in the deployment. You’ll need to set them as Fly app secrets, which you can do via the Fly dashboard or cli. fly secrets set ... Containerize our app Our Fly deployment will need a container image; let’s create a simple Dockerfile in the root of the project: Dockerfile .dockerignore Copy Ask AI FROM python:3.11-slim-bookworm # Open port 7860 for http service ENV FAST_API_PORT= 7860 EXPOSE 7860 # Install Python dependencies COPY \* .py . COPY ./requirements.txt requirements.txt RUN pip3 install --no-cache-dir --upgrade -r requirements.txt # Install models RUN python3 install_deps.py # Start the FastAPI server CMD python3 bot_runner.py --port ${ FAST_API_PORT } You can use any base image as long as Python is available Our container does the following: Opens port 7860 to serve our bot_runner.py FastAPI service. Downloads the necessary python dependencies. Download / cache the model dependencies the bot.py requires. Runs the bot_runner.py and listens for web requests. What models are we downloading? To support voice activity detection, we’re using Silero VAD. Whilst the filesize is not huge, having each new machine download the Silero model at runtime will impact bootup time. Instead, we include the model as part of the Docker image so it’s cached and available. You could, of course, also attach a network volume to each instance if you plan to include larger files as part of your deployment and don’t want to bloat the size of your image. Launching new machines in bot_runner.py When a user starts a session with our Pipecat bot, we want to launch a new machine on fly.io with it’s own system resources. Let’s grab the bot_runner.py from the example repo here . This runner differs from others in the Pipecat repo; we’ve added a new method that sends a REST request to Fly to provision a new machine for the session. This method is invoked as part of the /start_bot endpoint: bot_runner.py Copy Ask AI FLY_API_HOST = os.getenv( "FLY_API_HOST" , "https://api.machines.dev/v1" ) FLY_APP_NAME = os.getenv( "FLY_APP_NAME" , "your-fly-app-name" ) FLY_API_KEY = os.getenv( "FLY_API_KEY" , "" ) FLY_HEADERS = { 'Authorization' : f "Bearer { FLY_API_KEY } " , 'Content-Type' : 'application/json' } def spawn_fly_machine ( room_url : str , token : str ): # Use the same image as the bot runner res = requests.get( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines" , headers = FLY_HEADERS ) if res.status_code != 200 : raise Exception ( f "Unable to get machine info from Fly: { res.text } " ) image = res.json()[ 0 ][ 'config' ][ 'image' ] # Machine configuration cmd = f "python3 bot.py -u { room_url } -t { token } " cmd = cmd.split() worker_props = { "config" : { "image" : image, "auto_destroy" : True , "init" : { "cmd" : cmd }, "restart" : { "policy" : "no" }, "guest" : { "cpu_kind" : "shared" , "cpus" : 1 , "memory_mb" : 1024 # Note: 512 is just enough to run VAD, but 1gb is better } }, } # Spawn a new machine instance res = requests.post( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines" , headers = FLY_HEADERS , json = worker_props) if res.status_code != 200 : raise Exception ( f "Problem starting a bot worker: { res.text } " ) # Wait for the machine to enter the started state vm_id = res.json()[ 'id' ] res = requests.get( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines/ { vm_id } /wait?state=started" , headers = FLY_HEADERS ) if res.status_code != 200 : raise Exception ( f "Bot was unable to enter started state: { res.text } " ) We want to make sure the machine started ok before returning any data to the user. Fly launches machines pretty fast, but will timeout if things take longer than they should. Depending on your transport method, you may want to optimistically return a response to the user, so they can join the room and poll for the status of their bot. Launch the Fly project Getting your bot on Fly is as simple as: fly launch or fly launch --org orgname if you’re part of a team. This will step you through some configuration, and build and deploy your Docker image. Be sure to configure your app secrets with the necessary environment variables once the deployment has complete. Assuming all goes well, you can update with any changes with fly deploy . Test it out Start a new bot instance by sending a POST request to https://your-fly-url.fly.dev/start_bot . All being well, this will return a room URL and token. A nice feature of Fly is the ability to monitor your machines (with live logs) via their dashboard: https://fly.io/apps/YOUR-APP_NAME/machines This is really helpful for monitoring the status of your spawned machine, and debugging if things do not work as expected. This example is configured to expire after 5 minutes. The bot process is also configured to exit after the user leaves the room. This is a good way to ensure we don’t have any hanging VMs, although you’ll likely need to configure this behaviour this to meet your own needs. You’ll also notice that we set restart policy to no . This prevents the machine attempting to restart after the session has concluded and the process exits. Important considerations This example does little in the way of load balancing or app security. Indeed, a user can spawn a new machine on your account simply by sending a POST request to the bot_runner.py . Be sure to configure a maximum number of instances, or authenticate requests to avoid costs getting out of control. We also deployed our bot.py on a machine with the same image as our bot_runner.py . To optimize container file sizes and increase security, consider individual images that only deploy resources they require. Example: Pipecat Cloud Example: Cerebrium On this page Project setup Install the Fly CLI Creating the Pipecat project Adding a fly.toml Environment setup Containerize our app What models are we downloading? Launching new machines in bot_runner.py Launch the Fly project Test it out Important considerations Assistant Responses are generated using AI and may contain mistakes.
|
deployment_pattern_cabfd79c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/deployment/pattern#data-transport
|
2 |
+
Title: Deployment pattern - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Deployment pattern - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Deployment pattern Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Project structure A Pipecat project will often consist of the following: 1. Bot file E.g. bot.py . Your Pipecat bot / agent, containing all the pipelines that you want to run in order to communicate with an end-user. A bot file may take some command line arguments, such as a transport URL and configuration. 2. Bot runner E.g. bot_runner.py . Typically a basic HTTP service that listens for incoming user requests and spawns the relevant bot file in response. You can call these files whatever you like! We use bot.py and bot_runner.py for simplicity. Typical user / bot flow There are many ways to approach connecting users to bots. Pipecat is unopinionated about how exactly you should do this, but it’s helpful to put an idea forward. At a very basic level, it may look something like this: 1 User requests to join session via client / app Client initiates a HTTP request to a hosted bot runner service. 2 Bot runner handles the request Authenticates, configures and instantiates everything necessary for the session to commence (e.g. a new WebSocket channel, or WebRTC room, etc.) 3 Bot runner spawns bot / agent A new bot process / VM is created for the user to connect with (passing across any necessary configuration.) Your project may load just one bot file, contextually swap between multiple, or launch many at once. 4 Bot instantiates and joins session via specified transport credentials Bot initializes, connects to the session (e.g. locally or via WebSockets, WebRTC etc) and runs your bot code. 5 Bot runner returns status to client Once the bot is ready, the runner resolves the HTTP request with details for the client to connect. Bot runner The majority of use-cases require a way to trigger and manage a bot session over the internet. We call these bot runners; a HTTP service that provides a gateway for spawning bots on-demand. The anatomy of a bot runner service is entirery arbitrary, but at very least will have a method that spawns a new bot process, for example: Copy Ask AI import uvicorn from fastapi import FastAPI, Request, HTTPException from fastapi.responses import JSONResponse app = FastAPI() @app.post ( "/start_bot" ) async def start_bot ( request : Request) -> JSONResponse: # ... handle / authenticate the request # ... setup the transport session # Spawn a new bot process try : #... create a new bot instance except Exception as e: raise HTTPException( status_code = 500 , detail = f "Failed to start bot: { e } " ) # Return a URL for the user to join return JSONResponse({ ... }) if __name__ == "__main__" : uvicorn.run( "bot_runner:app" , host = "0.0.0.0" , port = 7860 ) This pseudo code defines a /start_bot/ endpoint which listens for incoming user POST requests or webhooks, then configures the session (such as creating rooms on your transport provider) and instantiates a new bot process. A client will typically require some information regarding the newly spawned bot, such as a web address, so we also return some JSON with the necessary details. Data transport Your transport layer is responsible for sending and receiving audio and video data over the internet. You will have implemented a transport layer as part of your bot.py pipeline. This may be a service that you want to host and include in your deployment, or it may be a third-party service waiting for peers to connect (such as Daily , or a websocket.) For this example, we will make use of Daily’s WebRTC transport. This will mean that our bot_runner.py will need to do some configuration when it spawns a new bot: Create and configure a new Daily room for the session to take place in. Issue both the bot and the user an authentication token to join the session. Whatever you use for your transport layer, you’ll likely need to setup some environmental variables and run some custom code before spawning the agent. Best practice for bot files A good pattern to work to is the assumption that your bot.py is an encapsulated entity and does not have any knowledge of the bot_runner.py . You should provide the bot everything it needs to operate during instantiation. Sticking to this approach helps keep things simple and makes it easier to step through debugging (if the bot launched and something goes wrong, you know to look for errors in your bot file.) Example Let’s assume we have a fully service-driven bot.py that connects to a WebRTC session, passes audio transcription to GPT4 and returns audio text-to-speech with ElevenLabs. We’ll also use Silero voice activity detection, to better know when the user has stopped talking. bot.py Copy Ask AI import asyncio import aiohttp import os import sys import argparse from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator from pipecat.frames.frames import LLMMessagesFrame, EndFrame from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.transports.services.daily import DailyParams, DailyTransport from pipecat.vad.silero import SileroVADAnalyzer from loguru import logger from dotenv import load_dotenv load_dotenv( override = True ) logger.remove( 0 ) logger.add(sys.stderr, level = "DEBUG" ) daily_api_key = os.getenv( "DAILY_API_KEY" , "" ) daily_api_url = os.getenv( "DAILY_API_URL" , "https://api.daily.co/v1" ) async def main ( room_url : str , token : str ): async with aiohttp.ClientSession() as session: transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( api_url = daily_api_url, api_key = daily_api_key, audio_in_enabled = True , audio_out_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ) ) tts = ElevenLabsTTSService( aiohttp_session = session, api_key = os.getenv( "ELEVENLABS_API_KEY" , "" ), voice_id = os.getenv( "ELEVENLABS_VOICE_ID" , "" ), ) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" ), model = "gpt-4o" ) messages = [ { "role" : "system" , "content" : "You are Chatbot, a friendly, helpful robot. Your output will be converted to audio so don't include special characters other than '!' or '?' in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying hello." , }, ] tma_in = LLMUserResponseAggregator(messages) tma_out = LLMAssistantResponseAggregator(messages) pipeline = Pipeline([ transport.input(), tma_in, llm, tts, transport.output(), tma_out, ]) task = PipelineTask(pipeline, params = PipelineParams( allow_interruptions = True )) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(messages)]) @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): await task.queue_frame(EndFrame()) @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state ): if state == "left" : await task.queue_frame(EndFrame()) runner = PipelineRunner() await runner.run(task) if __name__ == "__main__" : parser = argparse.ArgumentParser( description = "Pipecat Bot" ) parser.add_argument( "-u" , type = str , help = "Room URL" ) parser.add_argument( "-t" , type = str , help = "Token" ) config = parser.parse_args() asyncio.run(main(config.u, config.t)) HTTP API To launch this bot, let’s create a bot_runner.py that: Creates an API for users to send requests to. Launches a bot as a subprocess. bot_runner.py Copy Ask AI import os import argparse import subprocess from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams from fastapi import FastAPI, Request, HTTPException from fastapi.middleware.cors import CORSMiddleware from fastapi.responses import JSONResponse # Load API keys from env from dotenv import load_dotenv load_dotenv( override = True ) # ------------ Configuration ------------ # MAX_SESSION_TIME = 5 * 60 # 5 minutes # List of require env vars our bot requires REQUIRED_ENV_VARS = [ 'DAILY_API_KEY' , 'OPENAI_API_KEY' , 'ELEVENLABS_API_KEY' , 'ELEVENLABS_VOICE_ID' ] daily_rest_helper = DailyRESTHelper( os.getenv( "DAILY_API_KEY" , "" ), os.getenv( "DAILY_API_URL" , 'https://api.daily.co/v1' )) # ----------------- API ----------------- # app = FastAPI() app.add_middleware( CORSMiddleware, allow_origins = [ "*" ], allow_credentials = True , allow_methods = [ "*" ], allow_headers = [ "*" ] ) # ----------------- Main ----------------- # @app.post ( "/start_bot" ) async def start_bot ( request : Request) -> JSONResponse: try : # Grab any data included in the post request data = await request.json() except Exception as e: pass # Create a new Daily WebRTC room for the session to take place in try : params = DailyRoomParams( properties = DailyRoomProperties() ) room: DailyRoomObject = daily_rest_helper.create_room( params = params) except Exception as e: raise HTTPException( status_code = 500 , detail = f "Unable to provision room { e } " ) # Give the agent a token to join the session token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME ) # Return an error if we were unable to create a room or a token if not room or not token: raise HTTPException( status_code = 500 , detail = f "Failed to get token for room: { room_url } " ) try : # Start a new subprocess, passing the room and token to the bot file subprocess.Popen( [ f "python3 -m bot -u { room.url } -t { token } " ], shell = True , bufsize = 1 , cwd = os.path.dirname(os.path.abspath( __file__ ))) except Exception as e: raise HTTPException( status_code = 500 , detail = f "Failed to start subprocess: { e } " ) # Grab a token for the user to join with user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME ) # Return the room url and user token back to the user return JSONResponse({ "room_url" : room.url, "token" : user_token, }) if __name__ == "__main__" : # Check for required environment variables for env_var in REQUIRED_ENV_VARS : if env_var not in os.environ: raise Exception ( f "Missing environment variable: { env_var } ." ) parser = argparse.ArgumentParser( description = "Pipecat Bot Runner" ) parser.add_argument( "--host" , type = str , default = os.getenv( "HOST" , "0.0.0.0" ), help = "Host address" ) parser.add_argument( "--port" , type = int , default = os.getenv( "PORT" , 7860 ), help = "Port number" ) parser.add_argument( "--reload" , action = "store_true" , default = False , help = "Reload code on change" ) config = parser.parse_args() try : import uvicorn uvicorn.run( "bot_runner:app" , host = config.host, port = config.port, reload = config.reload ) except KeyboardInterrupt : print ( "Pipecat runner shutting down..." ) Dockerfile Since our bot is just using Python, our Dockerfile can be quite simple: Dockerfile install_deps.py Copy Ask AI FROM python:3.11-bullseye # Open port 7860 for http service ENV FAST_API_PORT= 7860 EXPOSE 7860 # Install Python dependencies COPY \* .py . COPY ./requirements.txt requirements.txt RUN pip3 install --no-cache-dir --upgrade -r requirements.txt # Install models RUN python3 install_deps.py # Start the FastAPI server CMD python3 bot_runner.py --port ${ FAST_API_PORT } The bot runner and bot requirements.txt : requirements.txt Copy Ask AI pipecat-ai[daily,openai,silero] fastapi uvicorn python-dotenv And finally, let’s create a .env file with our service keys .env Copy Ask AI DAILY_API_KEY = ... OPENAI_API_KEY = ... ELEVENLABS_API_KEY = ... ELEVENLABS_VOICE_ID = ... How it works Right now, this runner is spawning bot.py as a subprocess. When spawning the process, we pass through the transport room and token as system arguments to our bot, so it knows where to connect. Subprocesses serve as a great way to test out your bot in the cloud without too much hassle, but depending on the size of the host machine, it will likely not hold up well under load. Whilst some bots are just simple operators between the transport and third-party AI services (such as OpenAI), others have somewhat CPU-intensive operations, such as running and loading VAD models, so you may find you’re only able to scale this to support up to 5-10 concurrent bots. Scaling your setup would require virtualizing your bot with it’s own set of system resources, the process of which depends on your cloud provider. Best practices In an ideal world, we’d recommend containerizing your bot and bot runner independently so you can deploy each without any unnecessary dependencies or models. Most cloud providers will offer a way to deploy various images programmatically, which we explore in the various provider examples in these docs. For the sake of simplicity defining this pattern, we’re just using one container for everything. Build and run We should now have a project that contains the following files: bot.py bot_runner.py requirements.txt .env Dockerfile You can now docker build ... and deploy your container. Of course, you can still work with your bot in local development too: Copy Ask AI # Install and activate a virtual env python -m venv venv source venv/bin/activate # or OS equivalent pip install -r requirements.txt python bot_runner.py --host localhost --reload Overview Example: Pipecat Cloud On this page Project structure Typical user / bot flow Bot runner Data transport Best practice for bot files Example HTTP API Dockerfile How it works Best practices Build and run Assistant Responses are generated using AI and may contain mistakes.
|
deployment_pattern_e013764a.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/deployment/pattern#best-practice-for-bot-files
|
2 |
+
Title: Deployment pattern - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Deployment pattern - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Deployment pattern Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Project structure A Pipecat project will often consist of the following: 1. Bot file E.g. bot.py . Your Pipecat bot / agent, containing all the pipelines that you want to run in order to communicate with an end-user. A bot file may take some command line arguments, such as a transport URL and configuration. 2. Bot runner E.g. bot_runner.py . Typically a basic HTTP service that listens for incoming user requests and spawns the relevant bot file in response. You can call these files whatever you like! We use bot.py and bot_runner.py for simplicity. Typical user / bot flow There are many ways to approach connecting users to bots. Pipecat is unopinionated about how exactly you should do this, but it’s helpful to put an idea forward. At a very basic level, it may look something like this: 1 User requests to join session via client / app Client initiates a HTTP request to a hosted bot runner service. 2 Bot runner handles the request Authenticates, configures and instantiates everything necessary for the session to commence (e.g. a new WebSocket channel, or WebRTC room, etc.) 3 Bot runner spawns bot / agent A new bot process / VM is created for the user to connect with (passing across any necessary configuration.) Your project may load just one bot file, contextually swap between multiple, or launch many at once. 4 Bot instantiates and joins session via specified transport credentials Bot initializes, connects to the session (e.g. locally or via WebSockets, WebRTC etc) and runs your bot code. 5 Bot runner returns status to client Once the bot is ready, the runner resolves the HTTP request with details for the client to connect. Bot runner The majority of use-cases require a way to trigger and manage a bot session over the internet. We call these bot runners; a HTTP service that provides a gateway for spawning bots on-demand. The anatomy of a bot runner service is entirery arbitrary, but at very least will have a method that spawns a new bot process, for example: Copy Ask AI import uvicorn from fastapi import FastAPI, Request, HTTPException from fastapi.responses import JSONResponse app = FastAPI() @app.post ( "/start_bot" ) async def start_bot ( request : Request) -> JSONResponse: # ... handle / authenticate the request # ... setup the transport session # Spawn a new bot process try : #... create a new bot instance except Exception as e: raise HTTPException( status_code = 500 , detail = f "Failed to start bot: { e } " ) # Return a URL for the user to join return JSONResponse({ ... }) if __name__ == "__main__" : uvicorn.run( "bot_runner:app" , host = "0.0.0.0" , port = 7860 ) This pseudo code defines a /start_bot/ endpoint which listens for incoming user POST requests or webhooks, then configures the session (such as creating rooms on your transport provider) and instantiates a new bot process. A client will typically require some information regarding the newly spawned bot, such as a web address, so we also return some JSON with the necessary details. Data transport Your transport layer is responsible for sending and receiving audio and video data over the internet. You will have implemented a transport layer as part of your bot.py pipeline. This may be a service that you want to host and include in your deployment, or it may be a third-party service waiting for peers to connect (such as Daily , or a websocket.) For this example, we will make use of Daily’s WebRTC transport. This will mean that our bot_runner.py will need to do some configuration when it spawns a new bot: Create and configure a new Daily room for the session to take place in. Issue both the bot and the user an authentication token to join the session. Whatever you use for your transport layer, you’ll likely need to setup some environmental variables and run some custom code before spawning the agent. Best practice for bot files A good pattern to work to is the assumption that your bot.py is an encapsulated entity and does not have any knowledge of the bot_runner.py . You should provide the bot everything it needs to operate during instantiation. Sticking to this approach helps keep things simple and makes it easier to step through debugging (if the bot launched and something goes wrong, you know to look for errors in your bot file.) Example Let’s assume we have a fully service-driven bot.py that connects to a WebRTC session, passes audio transcription to GPT4 and returns audio text-to-speech with ElevenLabs. We’ll also use Silero voice activity detection, to better know when the user has stopped talking. bot.py Copy Ask AI import asyncio import aiohttp import os import sys import argparse from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator from pipecat.frames.frames import LLMMessagesFrame, EndFrame from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.transports.services.daily import DailyParams, DailyTransport from pipecat.vad.silero import SileroVADAnalyzer from loguru import logger from dotenv import load_dotenv load_dotenv( override = True ) logger.remove( 0 ) logger.add(sys.stderr, level = "DEBUG" ) daily_api_key = os.getenv( "DAILY_API_KEY" , "" ) daily_api_url = os.getenv( "DAILY_API_URL" , "https://api.daily.co/v1" ) async def main ( room_url : str , token : str ): async with aiohttp.ClientSession() as session: transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( api_url = daily_api_url, api_key = daily_api_key, audio_in_enabled = True , audio_out_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ) ) tts = ElevenLabsTTSService( aiohttp_session = session, api_key = os.getenv( "ELEVENLABS_API_KEY" , "" ), voice_id = os.getenv( "ELEVENLABS_VOICE_ID" , "" ), ) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" ), model = "gpt-4o" ) messages = [ { "role" : "system" , "content" : "You are Chatbot, a friendly, helpful robot. Your output will be converted to audio so don't include special characters other than '!' or '?' in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying hello." , }, ] tma_in = LLMUserResponseAggregator(messages) tma_out = LLMAssistantResponseAggregator(messages) pipeline = Pipeline([ transport.input(), tma_in, llm, tts, transport.output(), tma_out, ]) task = PipelineTask(pipeline, params = PipelineParams( allow_interruptions = True )) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(messages)]) @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): await task.queue_frame(EndFrame()) @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state ): if state == "left" : await task.queue_frame(EndFrame()) runner = PipelineRunner() await runner.run(task) if __name__ == "__main__" : parser = argparse.ArgumentParser( description = "Pipecat Bot" ) parser.add_argument( "-u" , type = str , help = "Room URL" ) parser.add_argument( "-t" , type = str , help = "Token" ) config = parser.parse_args() asyncio.run(main(config.u, config.t)) HTTP API To launch this bot, let’s create a bot_runner.py that: Creates an API for users to send requests to. Launches a bot as a subprocess. bot_runner.py Copy Ask AI import os import argparse import subprocess from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams from fastapi import FastAPI, Request, HTTPException from fastapi.middleware.cors import CORSMiddleware from fastapi.responses import JSONResponse # Load API keys from env from dotenv import load_dotenv load_dotenv( override = True ) # ------------ Configuration ------------ # MAX_SESSION_TIME = 5 * 60 # 5 minutes # List of require env vars our bot requires REQUIRED_ENV_VARS = [ 'DAILY_API_KEY' , 'OPENAI_API_KEY' , 'ELEVENLABS_API_KEY' , 'ELEVENLABS_VOICE_ID' ] daily_rest_helper = DailyRESTHelper( os.getenv( "DAILY_API_KEY" , "" ), os.getenv( "DAILY_API_URL" , 'https://api.daily.co/v1' )) # ----------------- API ----------------- # app = FastAPI() app.add_middleware( CORSMiddleware, allow_origins = [ "*" ], allow_credentials = True , allow_methods = [ "*" ], allow_headers = [ "*" ] ) # ----------------- Main ----------------- # @app.post ( "/start_bot" ) async def start_bot ( request : Request) -> JSONResponse: try : # Grab any data included in the post request data = await request.json() except Exception as e: pass # Create a new Daily WebRTC room for the session to take place in try : params = DailyRoomParams( properties = DailyRoomProperties() ) room: DailyRoomObject = daily_rest_helper.create_room( params = params) except Exception as e: raise HTTPException( status_code = 500 , detail = f "Unable to provision room { e } " ) # Give the agent a token to join the session token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME ) # Return an error if we were unable to create a room or a token if not room or not token: raise HTTPException( status_code = 500 , detail = f "Failed to get token for room: { room_url } " ) try : # Start a new subprocess, passing the room and token to the bot file subprocess.Popen( [ f "python3 -m bot -u { room.url } -t { token } " ], shell = True , bufsize = 1 , cwd = os.path.dirname(os.path.abspath( __file__ ))) except Exception as e: raise HTTPException( status_code = 500 , detail = f "Failed to start subprocess: { e } " ) # Grab a token for the user to join with user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME ) # Return the room url and user token back to the user return JSONResponse({ "room_url" : room.url, "token" : user_token, }) if __name__ == "__main__" : # Check for required environment variables for env_var in REQUIRED_ENV_VARS : if env_var not in os.environ: raise Exception ( f "Missing environment variable: { env_var } ." ) parser = argparse.ArgumentParser( description = "Pipecat Bot Runner" ) parser.add_argument( "--host" , type = str , default = os.getenv( "HOST" , "0.0.0.0" ), help = "Host address" ) parser.add_argument( "--port" , type = int , default = os.getenv( "PORT" , 7860 ), help = "Port number" ) parser.add_argument( "--reload" , action = "store_true" , default = False , help = "Reload code on change" ) config = parser.parse_args() try : import uvicorn uvicorn.run( "bot_runner:app" , host = config.host, port = config.port, reload = config.reload ) except KeyboardInterrupt : print ( "Pipecat runner shutting down..." ) Dockerfile Since our bot is just using Python, our Dockerfile can be quite simple: Dockerfile install_deps.py Copy Ask AI FROM python:3.11-bullseye # Open port 7860 for http service ENV FAST_API_PORT= 7860 EXPOSE 7860 # Install Python dependencies COPY \* .py . COPY ./requirements.txt requirements.txt RUN pip3 install --no-cache-dir --upgrade -r requirements.txt # Install models RUN python3 install_deps.py # Start the FastAPI server CMD python3 bot_runner.py --port ${ FAST_API_PORT } The bot runner and bot requirements.txt : requirements.txt Copy Ask AI pipecat-ai[daily,openai,silero] fastapi uvicorn python-dotenv And finally, let’s create a .env file with our service keys .env Copy Ask AI DAILY_API_KEY = ... OPENAI_API_KEY = ... ELEVENLABS_API_KEY = ... ELEVENLABS_VOICE_ID = ... How it works Right now, this runner is spawning bot.py as a subprocess. When spawning the process, we pass through the transport room and token as system arguments to our bot, so it knows where to connect. Subprocesses serve as a great way to test out your bot in the cloud without too much hassle, but depending on the size of the host machine, it will likely not hold up well under load. Whilst some bots are just simple operators between the transport and third-party AI services (such as OpenAI), others have somewhat CPU-intensive operations, such as running and loading VAD models, so you may find you’re only able to scale this to support up to 5-10 concurrent bots. Scaling your setup would require virtualizing your bot with it’s own set of system resources, the process of which depends on your cloud provider. Best practices In an ideal world, we’d recommend containerizing your bot and bot runner independently so you can deploy each without any unnecessary dependencies or models. Most cloud providers will offer a way to deploy various images programmatically, which we explore in the various provider examples in these docs. For the sake of simplicity defining this pattern, we’re just using one container for everything. Build and run We should now have a project that contains the following files: bot.py bot_runner.py requirements.txt .env Dockerfile You can now docker build ... and deploy your container. Of course, you can still work with your bot in local development too: Copy Ask AI # Install and activate a virtual env python -m venv venv source venv/bin/activate # or OS equivalent pip install -r requirements.txt python bot_runner.py --host localhost --reload Overview Example: Pipecat Cloud On this page Project structure Typical user / bot flow Bot runner Data transport Best practice for bot files Example HTTP API Dockerfile How it works Best practices Build and run Assistant Responses are generated using AI and may contain mistakes.
|
deployment_pipecat-cloud_9cbcfcad.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/deployment/pipecat-cloud
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
deployment_pipecat-cloud_a13c351c.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/deployment/pipecat-cloud#next-steps
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
deployment_wwwflyio_fcbf6f91.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/deployment/www.fly.io#join-our-community
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
features_gemini-multimodal-live_0d50e7fb.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/gemini-multimodal-live#real-time-events
|
2 |
+
Title: Building with Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Building with Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Building with Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal This guide will walk you through building a real-time AI chatbot using Gemini Multimodal Live and Pipecat. We’ll create a complete application with a Pipecat server and a Pipecat React client that enables natural conversations with an AI assistant. API Reference Gemini Multimodal Live API documentation Example Code Find the complete client and server code in Github Client SDK Pipecat React SDK documentation What We’ll Build In this guide, you’ll create: A FastAPI server that manages bot instances A Gemini-powered conversational AI bot A React client with real-time audio/video A complete pipeline for speech-to-speech interaction Key Concepts Before we dive into implementation, let’s cover some important concepts that will help you understand how Pipecat and Gemini work together. Understanding Pipelines At the heart of Pipecat is the pipeline system. A pipeline is a sequence of processors that handle different aspects of the conversation flow. Think of it like an assembly line where each station (processor) performs a specific task. For our chatbot, the pipeline looks like this: Copy Ask AI pipeline = Pipeline([ transport.input(), # Receives audio/video from the user via WebRTC rtvi, # Handles client/server messaging and events context_aggregator.user(), # Manages user message history llm, # Processes speech through Gemini talking_animation, # Controls bot's avatar transport.output(), # Sends audio/video back to the user via WebRTC context_aggregator.assistant(), # Manages bot message history ]) Processors Each processor in the pipeline handles a specific task: Transport transport.input() and transport.output() handle media streaming with Daily Context context_aggregator maintains conversation history for natural dialogue Speech Processing rtvi_user_transcription and rtvi_bot_transcription handle speech-to-text Animation talking_animation controls the bot’s visual state based on speaking activity The order of processors matters! Data flows through the pipeline in sequence, so each processor should receive the data it needs from previous processors. Learn more about the Core Concepts to Pipecat server. Gemini Integration The GeminiMultimodalLiveLLMService is a speech-to-speech LLM service that interfaces with the Gemini Multimodal Live API. It provides: Real-time speech-to-speech conversation Context management Voice activity detection Tool use Pipecat manages two types of connections: A WebRTC connection between the Pipecat client and server for reliable audio/video streaming A WebSocket connection between the Pipecat server and Gemini for real-time AI processing This architecture ensures stable media streaming while maintaining responsive AI interactions. Prerequisites Before we begin, you’ll need: Python 3.10 or higher Node.js 16 or higher A Daily API key A Google API key with Gemini Multimodal Live access Clone the Pipecat repo: Copy Ask AI git clone [email protected]:pipecat-ai/pipecat.git Server Implementation Let’s start by setting up the server components. Our server will handle bot management, room creation, and client connections. Environment Setup Navigate to the simple-chatbot’s server directory: Copy Ask AI cd examples/simple-chatbot/server Set up a python virtual environment: Copy Ask AI python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate Install requirements: Copy Ask AI pip install -r requirements.txt Copy env.example to .env and make a few changes: Copy Ask AI # Remove the hard-coded example room URL DAILY_SAMPLE_ROOM_URL = # Add your Daily and Gemini API keys DAILY_API_KEY = [your key here] GEMINI_API_KEY = [your key here] # Use Gemini implementation BOT_IMPLEMENTATION = gemini Server Setup (server.py) server.py is a FastAPI server that creates the meeting room where clients and bots interact, manages bot instances, and handles client connections. It’s the orchestrator that brings everything on the server-side together. Creating Meeting Room The server uses Daily’s API via a REST API helper to create rooms where clients and bots can meet. Each room is a secure space for audio/video communication: server/server.py Copy Ask AI async def create_room_and_token (): """Create a Daily room and generate access credentials.""" room = await daily_helpers[ "rest" ].create_room(DailyRoomParams()) token = await daily_helpers[ "rest" ].get_token(room.url) return room.url, token Managing Bot Instances When a client connects, the server starts a new bot instance configured specifically for that room. It keeps track of running bots and ensures there’s only one bot per room: server/server.py Copy Ask AI # Start the bot process for a specific room bot_file = "bot-gemini.py" proc = subprocess.Popen([ f "python3 -m { bot_file } -u { room_url } -t { token } " ]) bot_procs[proc.pid] = (proc, room_url) Connection Endpoints The server provides two ways to connect: Browser Access (/) Creates a room, starts a bot, and redirects the browser to the Daily meeting URL. Perfect for quick testing and development. RTVI Client (/connect) Creates a room, starts a bot, and returns connection credentials. Used by RTVI clients for custom implementations. Bot Implementation (bot-gemini.py) The bot implementation connects all the pieces: Daily transport, Gemini service, conversation context, and processors. Let’s break down each component: Transport Setup First, we configure the Daily transport, which handles WebRTC communication between the client and server. server/bot-gemini.py Copy Ask AI transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_in_enabled = True , # Enable audio input audio_out_enabled = True , # Enable audio output video_out_enabled = True , # Enable video output vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) Gemini Multimodal Live audio requirements: Input: 16 kHz sample rate Output: 24 kHz sample rate Gemini Service Configuration Next, we initialize the Gemini service which will provide speech-to-speech inference and communication: server/bot-gemini.py Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GEMINI_API_KEY" ), voice_id = "Puck" , # Choose your bot's voice params = InputParams( temperature = 0.7 ) # Set model input params ) Conversation Context We give our bot its personality and initial instructions: server/bot-gemini.py Copy Ask AI messages = [{ "role" : "user" , "content" : """You are Chatbot, a friendly, helpful robot. Keep responses brief and avoid special characters since output will be converted to audio.""" }] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) OpenAILLMContext is used as a common LLM base service for context management. In the future, we may add a specific context manager for Gemini. The context aggregator automatically maintains conversation history, helping the bot remember previous interactions. Processor Setup We initialize two additional processors in our pipeline to handle different aspects of the interaction: RTVI Processors RTVIProcessor : Handles all client communication events including transcriptions, speaking states, and performance metrics Animation TalkingAnimation : Controls the bot’s visual state, switching between static and animated frames based on speaking status Learn more about the RTVI framework and available processors. Pipeline Assembly Finally, we bring everything together in a pipeline: server/bot-gemini.py Copy Ask AI pipeline = Pipeline([ transport.input(), # Receive media rtvi, # Client UI events context_aggregator.user(), # Process user context llm, # Gemini processing ta, # Animation (talking/quiet states) transport.output(), # Send media context_aggregator.assistant() # Process bot context ]) task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), observers = [RTVIObserver(rtvi)], ) The order of processors is crucial! For example, the RTVI processor should be early in the pipeline to capture all relevant events. The RTVIObserver monitors the entire pipeline and automatically collects relevant events to send to the client. Client Implementation Our React client uses the Pipecat React SDK to communicate with the bot. Let’s explore how the client connects and interacts with our Pipecat server. Connection Setup The client needs to connect to our bot server using the same transport type (Daily WebRTC) that we configured on the server: examples/react/src/providers/PipecatProvider.tsx Copy Ask AI const client = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , // Enable audio input enableCam: false , // Disable video input enableScreenShare: false , // Disable screen sharing }); client . connect ({ endpoint: "http://localhost:7860/connect" , // Your bot connection endpoint }); The connection configuration must match your server: DailyTransport : Matches the WebRTC transport used in bot-gemini.py connect endpoint: Matches the /connect route in server.py Media settings: Controls which devices are enabled on join Media Handling Pipecat’s React components handle all the complex media stream management for you: Copy Ask AI function App () { return ( < PipecatClientProvider client = { client } > < div className = "app" > < PipecatClientVideo participant = "bot" /> { /* Bot's video feed */ } < PipecatClientAudio /> { /* Audio input/output */ } </ div > </ PipecatClientProvider > ); } The PipecatClientProvider is the root component for providing Pipecat client context to your application. By wrapping your PipecatClientAudio and PipecatClientVideo components in this provider, they can access the client instance and receive and process the streams received from the Pipecat server. Real-time Events The RTVI processors we configured in the pipeline emit events that we can handle in our client: Copy Ask AI // Listen for transcription events useRTVIClientEvent ( RTVIEvent . UserTranscript , ( data : TranscriptData ) => { if ( data . final ) { console . log ( `User said: ${ data . text } ` ); } }); // Listen for bot responses useRTVIClientEvent ( RTVIEvent . BotTranscript , ( data : BotLLMTextData ) => { console . log ( `Bot responded: ${ data . text } ` ); }); Available Events Speaking state changes Transcription updates Bot responses Connection status Performance metrics Event Usage Use these events to: Show speaking indicators Display transcripts Update UI state Monitor performance Optionally, uses callbacks to handle events in your application. Learn more in the Pipecat client docs. Complete Example Here’s a basic client implementation with connection status and transcription display: Copy Ask AI function ChatApp () { return ( < PipecatClientProvider client = { client } > < div className = "app" > { /* Connection UI */ } < StatusDisplay /> < ConnectButton /> { /* Media Components */ } < BotVideo /> < PipecatClientAudio /> { /* Debug/Transcript Display */ } < DebugDisplay /> </ div > </ PipecatClientProvider > ); } Check out the example repository for a complete client implementation with styling and error handling. Running the Application From the simple-chatbot directory, start the server and client to test the chatbot: 1. Start the Server In one terminal: Copy Ask AI python server/server.py 2. Start the Client In another terminal: Copy Ask AI cd examples/react npm install npm run dev 3. Testing the Connection Open http://localhost:5173 in your browser Click “Connect” to join a room Allow microphone access when prompted Start talking with your AI assistant Troubleshooting: Check that all API keys are properly configured in .env Grant your browser access to your microphone, so it can receive your audio input Verify WebRTC ports aren’t blocked by firewalls Next Steps Now that you have a working chatbot, consider these enhancements: Add custom avatar animations Implement function calling for external integrations Add support for multiple languages Enhance error recovery and reconnection logic Examples Foundational Example A basic implementation demonstrating core Gemini Multimodal Live features and transcription capabilities Simple Chatbot A complete client/server implementation showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot Learn More Gemini Multimodal Live API Reference React Client SDK Documentation Recording Transcripts Metrics On this page What We’ll Build Key Concepts Understanding Pipelines Processors Gemini Integration Prerequisites Server Implementation Environment Setup Server Setup (server.py) Creating Meeting Room Managing Bot Instances Connection Endpoints Bot Implementation (bot-gemini.py) Transport Setup Gemini Service Configuration Conversation Context Processor Setup Pipeline Assembly Client Implementation Connection Setup Media Handling Real-time Events Complete Example Running the Application 1. Start the Server 2. Start the Client 3. Testing the Connection Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|
features_gemini-multimodal-live_37e7755d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/gemini-multimodal-live#running-the-application
|
2 |
+
Title: Building with Gemini Multimodal Live - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Building with Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Building with Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal This guide will walk you through building a real-time AI chatbot using Gemini Multimodal Live and Pipecat. We’ll create a complete application with a Pipecat server and a Pipecat React client that enables natural conversations with an AI assistant. API Reference Gemini Multimodal Live API documentation Example Code Find the complete client and server code in Github Client SDK Pipecat React SDK documentation What We’ll Build In this guide, you’ll create: A FastAPI server that manages bot instances A Gemini-powered conversational AI bot A React client with real-time audio/video A complete pipeline for speech-to-speech interaction Key Concepts Before we dive into implementation, let’s cover some important concepts that will help you understand how Pipecat and Gemini work together. Understanding Pipelines At the heart of Pipecat is the pipeline system. A pipeline is a sequence of processors that handle different aspects of the conversation flow. Think of it like an assembly line where each station (processor) performs a specific task. For our chatbot, the pipeline looks like this: Copy Ask AI pipeline = Pipeline([ transport.input(), # Receives audio/video from the user via WebRTC rtvi, # Handles client/server messaging and events context_aggregator.user(), # Manages user message history llm, # Processes speech through Gemini talking_animation, # Controls bot's avatar transport.output(), # Sends audio/video back to the user via WebRTC context_aggregator.assistant(), # Manages bot message history ]) Processors Each processor in the pipeline handles a specific task: Transport transport.input() and transport.output() handle media streaming with Daily Context context_aggregator maintains conversation history for natural dialogue Speech Processing rtvi_user_transcription and rtvi_bot_transcription handle speech-to-text Animation talking_animation controls the bot’s visual state based on speaking activity The order of processors matters! Data flows through the pipeline in sequence, so each processor should receive the data it needs from previous processors. Learn more about the Core Concepts to Pipecat server. Gemini Integration The GeminiMultimodalLiveLLMService is a speech-to-speech LLM service that interfaces with the Gemini Multimodal Live API. It provides: Real-time speech-to-speech conversation Context management Voice activity detection Tool use Pipecat manages two types of connections: A WebRTC connection between the Pipecat client and server for reliable audio/video streaming A WebSocket connection between the Pipecat server and Gemini for real-time AI processing This architecture ensures stable media streaming while maintaining responsive AI interactions. Prerequisites Before we begin, you’ll need: Python 3.10 or higher Node.js 16 or higher A Daily API key A Google API key with Gemini Multimodal Live access Clone the Pipecat repo: Copy Ask AI git clone [email protected]:pipecat-ai/pipecat.git Server Implementation Let’s start by setting up the server components. Our server will handle bot management, room creation, and client connections. Environment Setup Navigate to the simple-chatbot’s server directory: Copy Ask AI cd examples/simple-chatbot/server Set up a python virtual environment: Copy Ask AI python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate Install requirements: Copy Ask AI pip install -r requirements.txt Copy env.example to .env and make a few changes: Copy Ask AI # Remove the hard-coded example room URL DAILY_SAMPLE_ROOM_URL = # Add your Daily and Gemini API keys DAILY_API_KEY = [your key here] GEMINI_API_KEY = [your key here] # Use Gemini implementation BOT_IMPLEMENTATION = gemini Server Setup (server.py) server.py is a FastAPI server that creates the meeting room where clients and bots interact, manages bot instances, and handles client connections. It’s the orchestrator that brings everything on the server-side together. Creating Meeting Room The server uses Daily’s API via a REST API helper to create rooms where clients and bots can meet. Each room is a secure space for audio/video communication: server/server.py Copy Ask AI async def create_room_and_token (): """Create a Daily room and generate access credentials.""" room = await daily_helpers[ "rest" ].create_room(DailyRoomParams()) token = await daily_helpers[ "rest" ].get_token(room.url) return room.url, token Managing Bot Instances When a client connects, the server starts a new bot instance configured specifically for that room. It keeps track of running bots and ensures there’s only one bot per room: server/server.py Copy Ask AI # Start the bot process for a specific room bot_file = "bot-gemini.py" proc = subprocess.Popen([ f "python3 -m { bot_file } -u { room_url } -t { token } " ]) bot_procs[proc.pid] = (proc, room_url) Connection Endpoints The server provides two ways to connect: Browser Access (/) Creates a room, starts a bot, and redirects the browser to the Daily meeting URL. Perfect for quick testing and development. RTVI Client (/connect) Creates a room, starts a bot, and returns connection credentials. Used by RTVI clients for custom implementations. Bot Implementation (bot-gemini.py) The bot implementation connects all the pieces: Daily transport, Gemini service, conversation context, and processors. Let’s break down each component: Transport Setup First, we configure the Daily transport, which handles WebRTC communication between the client and server. server/bot-gemini.py Copy Ask AI transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_in_enabled = True , # Enable audio input audio_out_enabled = True , # Enable audio output video_out_enabled = True , # Enable video output vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) Gemini Multimodal Live audio requirements: Input: 16 kHz sample rate Output: 24 kHz sample rate Gemini Service Configuration Next, we initialize the Gemini service which will provide speech-to-speech inference and communication: server/bot-gemini.py Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GEMINI_API_KEY" ), voice_id = "Puck" , # Choose your bot's voice params = InputParams( temperature = 0.7 ) # Set model input params ) Conversation Context We give our bot its personality and initial instructions: server/bot-gemini.py Copy Ask AI messages = [{ "role" : "user" , "content" : """You are Chatbot, a friendly, helpful robot. Keep responses brief and avoid special characters since output will be converted to audio.""" }] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) OpenAILLMContext is used as a common LLM base service for context management. In the future, we may add a specific context manager for Gemini. The context aggregator automatically maintains conversation history, helping the bot remember previous interactions. Processor Setup We initialize two additional processors in our pipeline to handle different aspects of the interaction: RTVI Processors RTVIProcessor : Handles all client communication events including transcriptions, speaking states, and performance metrics Animation TalkingAnimation : Controls the bot’s visual state, switching between static and animated frames based on speaking status Learn more about the RTVI framework and available processors. Pipeline Assembly Finally, we bring everything together in a pipeline: server/bot-gemini.py Copy Ask AI pipeline = Pipeline([ transport.input(), # Receive media rtvi, # Client UI events context_aggregator.user(), # Process user context llm, # Gemini processing ta, # Animation (talking/quiet states) transport.output(), # Send media context_aggregator.assistant() # Process bot context ]) task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), observers = [RTVIObserver(rtvi)], ) The order of processors is crucial! For example, the RTVI processor should be early in the pipeline to capture all relevant events. The RTVIObserver monitors the entire pipeline and automatically collects relevant events to send to the client. Client Implementation Our React client uses the Pipecat React SDK to communicate with the bot. Let’s explore how the client connects and interacts with our Pipecat server. Connection Setup The client needs to connect to our bot server using the same transport type (Daily WebRTC) that we configured on the server: examples/react/src/providers/PipecatProvider.tsx Copy Ask AI const client = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , // Enable audio input enableCam: false , // Disable video input enableScreenShare: false , // Disable screen sharing }); client . connect ({ endpoint: "http://localhost:7860/connect" , // Your bot connection endpoint }); The connection configuration must match your server: DailyTransport : Matches the WebRTC transport used in bot-gemini.py connect endpoint: Matches the /connect route in server.py Media settings: Controls which devices are enabled on join Media Handling Pipecat’s React components handle all the complex media stream management for you: Copy Ask AI function App () { return ( < PipecatClientProvider client = { client } > < div className = "app" > < PipecatClientVideo participant = "bot" /> { /* Bot's video feed */ } < PipecatClientAudio /> { /* Audio input/output */ } </ div > </ PipecatClientProvider > ); } The PipecatClientProvider is the root component for providing Pipecat client context to your application. By wrapping your PipecatClientAudio and PipecatClientVideo components in this provider, they can access the client instance and receive and process the streams received from the Pipecat server. Real-time Events The RTVI processors we configured in the pipeline emit events that we can handle in our client: Copy Ask AI // Listen for transcription events useRTVIClientEvent ( RTVIEvent . UserTranscript , ( data : TranscriptData ) => { if ( data . final ) { console . log ( `User said: ${ data . text } ` ); } }); // Listen for bot responses useRTVIClientEvent ( RTVIEvent . BotTranscript , ( data : BotLLMTextData ) => { console . log ( `Bot responded: ${ data . text } ` ); }); Available Events Speaking state changes Transcription updates Bot responses Connection status Performance metrics Event Usage Use these events to: Show speaking indicators Display transcripts Update UI state Monitor performance Optionally, uses callbacks to handle events in your application. Learn more in the Pipecat client docs. Complete Example Here’s a basic client implementation with connection status and transcription display: Copy Ask AI function ChatApp () { return ( < PipecatClientProvider client = { client } > < div className = "app" > { /* Connection UI */ } < StatusDisplay /> < ConnectButton /> { /* Media Components */ } < BotVideo /> < PipecatClientAudio /> { /* Debug/Transcript Display */ } < DebugDisplay /> </ div > </ PipecatClientProvider > ); } Check out the example repository for a complete client implementation with styling and error handling. Running the Application From the simple-chatbot directory, start the server and client to test the chatbot: 1. Start the Server In one terminal: Copy Ask AI python server/server.py 2. Start the Client In another terminal: Copy Ask AI cd examples/react npm install npm run dev 3. Testing the Connection Open http://localhost:5173 in your browser Click “Connect” to join a room Allow microphone access when prompted Start talking with your AI assistant Troubleshooting: Check that all API keys are properly configured in .env Grant your browser access to your microphone, so it can receive your audio input Verify WebRTC ports aren’t blocked by firewalls Next Steps Now that you have a working chatbot, consider these enhancements: Add custom avatar animations Implement function calling for external integrations Add support for multiple languages Enhance error recovery and reconnection logic Examples Foundational Example A basic implementation demonstrating core Gemini Multimodal Live features and transcription capabilities Simple Chatbot A complete client/server implementation showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot Learn More Gemini Multimodal Live API Reference React Client SDK Documentation Recording Transcripts Metrics On this page What We’ll Build Key Concepts Understanding Pipelines Processors Gemini Integration Prerequisites Server Implementation Environment Setup Server Setup (server.py) Creating Meeting Room Managing Bot Instances Connection Endpoints Bot Implementation (bot-gemini.py) Transport Setup Gemini Service Configuration Conversation Context Processor Setup Pipeline Assembly Client Implementation Connection Setup Media Handling Real-time Events Complete Example Running the Application 1. Start the Server 2. Start the Client 3. Testing the Connection Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
|
features_krisp_340dc07e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/krisp#troubleshooting
|
2 |
+
Title: Noise cancellation with Krisp - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Noise cancellation with Krisp - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Noise cancellation with Krisp Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Overview This guide will walk you through setting up and using Krisp’s noise reduction capabilities in your Pipecat application. Krisp provides professional-grade noise cancellation that can significantly improve audio quality in real-time communications. To use Krisp’s noise cancellation, you’ll need to obtain their SDK and models through a Krisp developer account. Our Pipecat Krisp module simplifies the integration process, which we’ll cover in this guide. Walkthrough Get Access to Krisp SDK and Models Create a Krisp developers account at krisp.ai/developers Download the Krisp Desktop SDK (v7) that matches your platform: Linux: Desktop SDK v7.0.2: Linux macOS (ARM): Desktop SDK v7.0.1: Mac ARM macOS (Intel): Desktop SDK v7.0.1: Mac Intel Windows(ARM): Desktop SDK v7.0.2: Windows ARM Windows (x64): Desktop SDK v7.0.2: Windows Download the corresponding models. We recommend trying the Background Voice Cancellation model. Recommended model for each platform: Linux (ARM): hs.c6.f.s.de56df.kw Mac (ARM): hs.c6.f.s.de56df.kw Linux (x86_64): hs.c6.f.s.de56df.kw Mac (x86_64): hs.c6.f.s.de56df.kw Windows (x86_64): hs.c6.f.s.de56df.kw Install build dependencies The pipecat-ai-krisp module is a python wrapper around Krisp’s C++ SDK. To build the module, you’ll need to install the following dependencies: macOS Ubuntu/Debian Windows Copy Ask AI # Using Homebrew brew install cmake pybind11 Copy Ask AI # Using Homebrew brew install cmake pybind11 Copy Ask AI # Using apt sudo apt-get update sudo apt-get install cmake python3-dev pybind11-dev g++ Copy Ask AI # Using Chocolatey choco install cmake # pybind11 will be installed via pip during the build process # OR using Visual Studio # 1. Install Visual Studio with C++ development tools # 2. Install CMake from https://cmake.org/download/ # pybind11 will be installed via pip during the build process For Windows users: Make sure you have Visual Studio installed with C++ development tools, or alternatively, have the Visual C++ Build Tools installed. Install the pipecat-ai-krisp module In your Pipecat repo, activate your virtual environment: macOS/Linux Windows Copy Ask AI python3 -m venv venv source venv/bin/activate Copy Ask AI python3 -m venv venv source venv/bin/activate Copy Ask AI # Using PowerShell python - m venv venv .\venv\Scripts\Activate.ps1 # OR using Command Prompt python - m venv venv .\venv\Scripts\ activate.bat # OR using Git Bash python - m venv venv source venv / Scripts / activate Export the path to your Krisp SDK and model files: macOS/Linux Windows Copy Ask AI # Add to your .env file or shell configuration export KRISP_SDK_PATH = / PATH / TO / KRISP / SDK export KRISP_MODEL_PATH = / PATH / TO / KRISP / MODEL . kef Copy Ask AI # Add to your .env file or shell configuration export KRISP_SDK_PATH = / PATH / TO / KRISP / SDK export KRISP_MODEL_PATH = / PATH / TO / KRISP / MODEL . kef Copy Ask AI # Using PowerShell $ env: KRISP_SDK_PATH = "C:\PATH\TO\KRISP\SDK" $ env: KRISP_MODEL_PATH = "C:\PATH\TO\KRISP\MODEL.kef" # OR using Command Prompt set KRISP_SDK_PATH = C:\PATH\TO\KRISP\SDK set KRISP_MODEL_PATH = C:\PATH\TO\KRISP\MODEL.kef When selecting a KRISP_MODEL_PATH , ensure that you’re selecting the actual model file, not just the directory. The path should look something like this: ARM Linux Mac ARM Windows Copy Ask AI export KRISP_MODEL_PATH = ./ krisp / hs . c6 . f . s . de56df . kw Copy Ask AI export KRISP_MODEL_PATH = ./ krisp / hs . c6 . f . s . de56df . kw export Copy Ask AI KRISP_MODEL_PATH = ./krisp/outbound-bvc-models-fp16/hs.c6.f.s.de56df.bucharest.kef Copy Ask AI $ env: KRISP_MODEL_PATH = "C:\krisp\outbound-bvc-models-fp32\hs.c6.f.s.de56df.bucharest.kef" Next, install the pipecat-ai[krisp] module, which will automatically build the pipecat-ai-krisp python wrapper module: Copy Ask AI pip install "pipecat-ai[krisp]" Test the integration You can now test the Krisp integration. The easiest way to do this is to run the foundational example: 07p-interruptible-krisp.py . You can run a foundational example by running the following command: Copy Ask AI python examples/foundational/07p-interruptible-krisp.py -u YOUR_DAILY_ROOM_URL Important for macOS users If you’re running on macOS you may receive a security warning about running the script. This is expected. You can allow access by going to System Settings > Privacy & Security then click Allow Anyway to permit the example to run. After allowing and re-running, you may get a pop-up asking for permission. Select Open Anyway to run the script. Usage Example Here’s a basic example of how to add Krisp noise reduction to your Pipecat pipeline: Copy Ask AI from pipecat.audio.filters.krisp_filter import KrispFilter from pipecat.transports.services.daily import DailyParams, DailyTransport # Add to transport configuration transport = DailyTransport( room_url, token, "Audio Bot" , DailyParams( audio_in_filter = KrispFilter(), # Enable Krisp noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) Troubleshooting Common issues and solutions: Missing Dependencies Copy Ask AI Error: Missing module: pipecat_ai_krisp Solution: Ensure you’ve installed with the krisp extra: pip install "pipecat-ai[krisp]" Model Path Not Found Copy Ask AI Error: Model path for KrispAudioProcessor must be provided Solution: Set the KRISP_MODEL_PATH environment variable or provide it in the constructor SDK Path Issues Copy Ask AI Error: Cannot find Krisp SDK Solution: Verify KRISP_SDK_PATH points to a valid Krisp SDK installation Additional Resources KrispFilter Reference Documentation Example Application Metrics OpenAI Audio Models and APIs On this page Overview Walkthrough Get Access to Krisp SDK and Models Install build dependencies Install the pipecat-ai-krisp module Test the integration Important for macOS users Usage Example Troubleshooting Additional Resources Assistant Responses are generated using AI and may contain mistakes.
|
features_pipecat-flows_07c1d855.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/pipecat-flows#function-configuration
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows Installation If you’re already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If you’re starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google 💡 Want to design your flows visually? Try the online Flow Editor Core Concepts Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, it’s expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node Node Structure Each node in your flow represents a conversation state and consists of three main components: Messages Nodes use two types of messages to control the conversation: Role Messages : Define the bot’s personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each node’s purpose. Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the function’s signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next node’s messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the node’s messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasn’t told them yet). While it’s always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new node’s messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current node’s purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Here’s a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You don’t need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. Implementation Approaches Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. Example Implementation Here’s a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) Best Practices Store shared data in flow_manager.state Create separate functions for node creation Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. Visual Design Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } Naming Conventions Start Node : Use descriptive names (e.g., “greeting”, “welcome”) Flow Nodes : Name based on purpose (e.g., “collect_info”, “verify_data”) End Node : Conventionally named “end” Functions : Use clear, action-oriented names Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. Using the Editor Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
|
features_pipecat-flows_10d38465.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/pipecat-flows#anthropic-format
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows Installation If you’re already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If you’re starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google 💡 Want to design your flows visually? Try the online Flow Editor Core Concepts Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, it’s expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node Node Structure Each node in your flow represents a conversation state and consists of three main components: Messages Nodes use two types of messages to control the conversation: Role Messages : Define the bot’s personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each node’s purpose. Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the function’s signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next node’s messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the node’s messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasn’t told them yet). While it’s always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new node’s messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current node’s purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Here’s a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You don’t need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. Implementation Approaches Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. Example Implementation Here’s a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) Best Practices Store shared data in flow_manager.state Create separate functions for node creation Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. Visual Design Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } Naming Conventions Start Node : Use descriptive names (e.g., “greeting”, “welcome”) Flow Nodes : Name based on purpose (e.g., “collect_info”, “verify_data”) End Node : Conventionally named “end” Functions : Use clear, action-oriented names Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. Using the Editor Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
|
features_pipecat-flows_c7ec073f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/pipecat-flows#actions
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows Installation If you’re already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If you’re starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google 💡 Want to design your flows visually? Try the online Flow Editor Core Concepts Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, it’s expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node Node Structure Each node in your flow represents a conversation state and consists of three main components: Messages Nodes use two types of messages to control the conversation: Role Messages : Define the bot’s personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each node’s purpose. Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the function’s signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next node’s messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the node’s messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasn’t told them yet). While it’s always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new node’s messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current node’s purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Here’s a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You don’t need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. Implementation Approaches Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. Example Implementation Here’s a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) Best Practices Store shared data in flow_manager.state Create separate functions for node creation Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. Visual Design Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } Naming Conventions Start Node : Use descriptive names (e.g., “greeting”, “welcome”) Flow Nodes : Name based on purpose (e.g., “collect_info”, “verify_data”) End Node : Conventionally named “end” Functions : Use clear, action-oriented names Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. Using the Editor Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
|
features_pipecat-flows_e0431dc6.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/features/pipecat-flows#naming-conventions
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows Installation If you’re already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If you’re starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google 💡 Want to design your flows visually? Try the online Flow Editor Core Concepts Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, it’s expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node Node Structure Each node in your flow represents a conversation state and consists of three main components: Messages Nodes use two types of messages to control the conversation: Role Messages : Define the bot’s personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each node’s purpose. Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the function’s signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next node’s messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the node’s messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasn’t told them yet). While it’s always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new node’s messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current node’s purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Here’s a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You don’t need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. Implementation Approaches Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. Example Implementation Here’s a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) Best Practices Store shared data in flow_manager.state Create separate functions for node creation Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. Visual Design Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } Naming Conventions Start Node : Use descriptive names (e.g., “greeting”, “welcome”) Flow Nodes : Name based on purpose (e.g., “collect_info”, “verify_data”) End Node : Conventionally named “end” Functions : Use clear, action-oriented names Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. Using the Editor Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
|
filters_identify-filter_8203164a.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/identify-filter#notes
|
2 |
+
Title: IdentityFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
IdentityFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters IdentityFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview IdentityFilter is a simple pass-through processor that forwards all frames without any modification or filtering. It acts as a transparent layer in your pipeline, allowing all frames to flow through unchanged. Check out Observers for an option that delivers similar functionality but doesn’t require a processor to reside in the Pipeline. Constructor Parameters The IdentityFilter constructor accepts no specific parameters beyond those inherited from FrameProcessor . Functionality When a frame passes through the processor, it is immediately forwarded in the same direction with no changes. This applies to all frame types and both directions (upstream and downstream). Use Cases While functionally equivalent to having no filter at all, IdentityFilter can be useful in several scenarios: Testing ParallelPipeline configurations to ensure frames aren’t duplicated Acting as a placeholder where a more complex filter might be added later Monitoring frame flow in pipelines by adding logging in subclasses Creating a base class for more complex conditional filters Usage Example Copy Ask AI from pipecat.processors.filters import IdentityFilter # Create an identity filter pass_through = IdentityFilter() # Add to pipeline pipeline = Pipeline([ source, pass_through, # All frames pass through unchanged destination ]) Frame Flow Notes Simplest possible filter implementation Passes all frames through without modification Useful in testing parallel pipelines Can serve as a placeholder or base class Zero overhead in normal operation FunctionFilter NullFilter On this page Overview Constructor Parameters Functionality Use Cases Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
filters_stt-mute_a3e0a3c4.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/stt-mute#custom-muting-logic
|
2 |
+
Title: STTMuteFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
STTMuteFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters STTMuteFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview STTMuteFilter is a general-purpose processor that combines STT muting and interruption control. When active, it prevents both transcription and interruptions during specified conditions (e.g., bot speech, function calls), providing a cleaner conversation flow. The processor supports multiple simultaneous strategies for when to mute the STT service, making it flexible for different use cases. Want to try it out? Check out the STTMuteFilter foundational demo Constructor Parameters config STTMuteConfig required Configuration object that defines the muting strategies and optional custom logic stt_service Optional[STTService] required The STT service to control (deprecated, will be removed in a future version) Configuration The processor is configured using STTMuteConfig , which determines when and how the STT service should be muted: strategies set[STTMuteStrategy] Set of muting strategies to apply should_mute_callback Callable[[STTMuteFilter], Awaitable[bool]] default: "None" Optional callback for custom muting logic (required when strategy is CUSTOM ) Muting Strategies STTMuteConfig accepts a set of these STTMuteStrategy values: FIRST_SPEECH STTMuteStrategy Mute only during the bot’s first speech (typically during introduction) MUTE_UNTIL_FIRST_BOT_COMPLETE STTMuteStrategy Start muted and remain muted until first bot speech completes. Useful when bot speaks first and you want to ensure its first response cannot be interrupted. FUNCTION_CALL STTMuteStrategy Mute during LLM function calls (e.g., API requests, external service calls) ALWAYS STTMuteStrategy Mute during all bot speech CUSTOM STTMuteStrategy Use custom logic provided via callback to determine when to mute. The callback is invoked when the bot is speaking and can use application state to decide whether to mute. When the bot stops speaking, unmuting occurs automatically if no other strategy requires muting. MUTE_UNTIL_FIRST_BOT_COMPLETE and FIRST_SPEECH strategies should not be used together as they handle the first bot speech differently. Input Frames BotStartedSpeakingFrame Frame Indicates bot has started speaking BotStoppedSpeakingFrame Frame Indicates bot has stopped speaking FunctionCallInProgressFrame Frame Indicates a function call has started FunctionCallResultFrame Frame Indicates a function call has completed StartInterruptionFrame Frame User interruption start event (suppressed when muted) StopInterruptionFrame Frame User interruption stop event (suppressed when muted) UserStartedSpeakingFrame Frame Indicates user has started speaking (suppressed when muted) UserStoppedSpeakingFrame Frame Indicates user has stopped speaking (suppressed when muted) Output Frames STTMuteFrame Frame Control frame to mute/unmute the STT service All input frames are passed through except VAD-related frames (interruptions and user speaking events) when muted. Usage Examples Basic Usage (Mute During Bot’s First Speech) Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. FIRST_SPEECH }) ) pipeline = Pipeline([ transport.input(), stt_mute_filter, # Add before STT service stt, # ... rest of pipeline ]) Mute Until First Bot Response Completes Copy Ask AI stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE }) ) This ensures no user speech is processed until after the bot’s first complete response. Always Mute During Bot Speech Copy Ask AI stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. ALWAYS }) ) Custom Muting Logic The CUSTOM strategy allows you to control muting based on application state when the bot is speaking. The callback will be invoked whenever the bot is speaking, and your logic decides whether to mute: Copy Ask AI # Create a state manager class SessionState : def __init__ ( self ): self .session_ending = False session_state = SessionState() # Callback function that determines whether to mute async def session_state_mute_logic ( stt_filter : STTMuteFilter) -> bool : # Return True to mute, False otherwise # This is called when the bot is speaking return session_state.session_ending # Configure filter with CUSTOM strategy stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. CUSTOM }, should_mute_callback = session_state_mute_logic ) ) # Later, when you want to trigger muting (e.g., during session timeout): async def handle_session_timeout (): # Update state that will be checked by the callback session_state.session_ending = True # Send goodbye message goodbye_message = "Thank you for using our service. This session is now ending." await pipeline.push_frame(TTSSpeakFrame( text = goodbye_message)) # The system will automatically mute during this message because: # 1. Bot starts speaking, triggering the callback # 2. Callback returns True (session_ending is True) # 3. When bot stops speaking, unmuting happens automatically Combining Multiple Strategies Copy Ask AI async def custom_mute_logic ( processor : STTMuteFilter) -> bool : # Example: Mute during business hours only current_hour = datetime.now().hour return 9 <= current_hour < 17 stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. FUNCTION_CALL , # Mute during function calls STTMuteStrategy. CUSTOM , # And during business hours STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE # And until first bot speech completes }, should_mute_callback = custom_mute_logic ) ) Frame Flow Notes Combines STT muting and interruption control into a single concept Muting prevents both transcription and interruptions Multiple strategies can be active simultaneously CUSTOM strategy callback is only invoked when the bot is speaking Unmuting happens automatically when bot speech ends (if no other strategy requires muting) Placed before STT service in pipeline Maintains conversation flow during bot speech and function calls Efficient state tracking for minimal overhead NullFilter WakeCheckFilter On this page Overview Constructor Parameters Configuration Muting Strategies Input Frames Output Frames Usage Examples Basic Usage (Mute During Bot’s First Speech) Mute Until First Bot Response Completes Always Mute During Bot Speech Custom Muting Logic Combining Multiple Strategies Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
filters_stt-mute_e3c145a9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/filters/stt-mute#param-strategies
|
2 |
+
Title: STTMuteFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
STTMuteFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters STTMuteFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview STTMuteFilter is a general-purpose processor that combines STT muting and interruption control. When active, it prevents both transcription and interruptions during specified conditions (e.g., bot speech, function calls), providing a cleaner conversation flow. The processor supports multiple simultaneous strategies for when to mute the STT service, making it flexible for different use cases. Want to try it out? Check out the STTMuteFilter foundational demo Constructor Parameters config STTMuteConfig required Configuration object that defines the muting strategies and optional custom logic stt_service Optional[STTService] required The STT service to control (deprecated, will be removed in a future version) Configuration The processor is configured using STTMuteConfig , which determines when and how the STT service should be muted: strategies set[STTMuteStrategy] Set of muting strategies to apply should_mute_callback Callable[[STTMuteFilter], Awaitable[bool]] default: "None" Optional callback for custom muting logic (required when strategy is CUSTOM ) Muting Strategies STTMuteConfig accepts a set of these STTMuteStrategy values: FIRST_SPEECH STTMuteStrategy Mute only during the bot’s first speech (typically during introduction) MUTE_UNTIL_FIRST_BOT_COMPLETE STTMuteStrategy Start muted and remain muted until first bot speech completes. Useful when bot speaks first and you want to ensure its first response cannot be interrupted. FUNCTION_CALL STTMuteStrategy Mute during LLM function calls (e.g., API requests, external service calls) ALWAYS STTMuteStrategy Mute during all bot speech CUSTOM STTMuteStrategy Use custom logic provided via callback to determine when to mute. The callback is invoked when the bot is speaking and can use application state to decide whether to mute. When the bot stops speaking, unmuting occurs automatically if no other strategy requires muting. MUTE_UNTIL_FIRST_BOT_COMPLETE and FIRST_SPEECH strategies should not be used together as they handle the first bot speech differently. Input Frames BotStartedSpeakingFrame Frame Indicates bot has started speaking BotStoppedSpeakingFrame Frame Indicates bot has stopped speaking FunctionCallInProgressFrame Frame Indicates a function call has started FunctionCallResultFrame Frame Indicates a function call has completed StartInterruptionFrame Frame User interruption start event (suppressed when muted) StopInterruptionFrame Frame User interruption stop event (suppressed when muted) UserStartedSpeakingFrame Frame Indicates user has started speaking (suppressed when muted) UserStoppedSpeakingFrame Frame Indicates user has stopped speaking (suppressed when muted) Output Frames STTMuteFrame Frame Control frame to mute/unmute the STT service All input frames are passed through except VAD-related frames (interruptions and user speaking events) when muted. Usage Examples Basic Usage (Mute During Bot’s First Speech) Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. FIRST_SPEECH }) ) pipeline = Pipeline([ transport.input(), stt_mute_filter, # Add before STT service stt, # ... rest of pipeline ]) Mute Until First Bot Response Completes Copy Ask AI stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE }) ) This ensures no user speech is processed until after the bot’s first complete response. Always Mute During Bot Speech Copy Ask AI stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. ALWAYS }) ) Custom Muting Logic The CUSTOM strategy allows you to control muting based on application state when the bot is speaking. The callback will be invoked whenever the bot is speaking, and your logic decides whether to mute: Copy Ask AI # Create a state manager class SessionState : def __init__ ( self ): self .session_ending = False session_state = SessionState() # Callback function that determines whether to mute async def session_state_mute_logic ( stt_filter : STTMuteFilter) -> bool : # Return True to mute, False otherwise # This is called when the bot is speaking return session_state.session_ending # Configure filter with CUSTOM strategy stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = {STTMuteStrategy. CUSTOM }, should_mute_callback = session_state_mute_logic ) ) # Later, when you want to trigger muting (e.g., during session timeout): async def handle_session_timeout (): # Update state that will be checked by the callback session_state.session_ending = True # Send goodbye message goodbye_message = "Thank you for using our service. This session is now ending." await pipeline.push_frame(TTSSpeakFrame( text = goodbye_message)) # The system will automatically mute during this message because: # 1. Bot starts speaking, triggering the callback # 2. Callback returns True (session_ending is True) # 3. When bot stops speaking, unmuting happens automatically Combining Multiple Strategies Copy Ask AI async def custom_mute_logic ( processor : STTMuteFilter) -> bool : # Example: Mute during business hours only current_hour = datetime.now().hour return 9 <= current_hour < 17 stt_mute_filter = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. FUNCTION_CALL , # Mute during function calls STTMuteStrategy. CUSTOM , # And during business hours STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE # And until first bot speech completes }, should_mute_callback = custom_mute_logic ) ) Frame Flow Notes Combines STT muting and interruption control into a single concept Muting prevents both transcription and interruptions Multiple strategies can be active simultaneously CUSTOM strategy callback is only invoked when the bot is speaking Unmuting happens automatically when bot speech ends (if no other strategy requires muting) Placed before STT service in pipeline Maintains conversation flow during bot speech and function calls Efficient state tracking for minimal overhead NullFilter WakeCheckFilter On this page Overview Constructor Parameters Configuration Muting Strategies Input Frames Output Frames Usage Examples Basic Usage (Mute During Bot’s First Speech) Mute Until First Bot Response Completes Always Mute During Bot Speech Custom Muting Logic Combining Multiple Strategies Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
flows_pipecat-flows_2ded68ca.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-role-messages
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows Core Types FlowArgs FlowArgs Dict[str, Any] Type alias for function handler arguments. FlowResult FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields status str Optional status field error str Optional error message FlowConfig FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields initial_node str required Starting node identifier nodes Dict[str, NodeConfig] required Map of node names to configurations NodeConfig NodeConfig TypedDict Configuration for a single node in the flow. Show Fields name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False Function Handler Types LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ContextStrategy ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values APPEND str Default strategy. Adds new messages to existing context. RESET str Clears context and starts fresh with new messages. RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ContextStrategyConfig ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields strategy ContextStrategy required The strategy to use for context management summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) FlowsFunctionSchema FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters name str required Name of the function description str required Description of the function’s purpose properties Dict[str, Any] required Dictionary defining properties types and descriptions required List[str] required List of required parameter names handler Optional[FunctionHandler] Function handler to process the function call transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) FlowManager FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters task PipelineTask required Pipeline task for frame queueing llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. context_aggregator Any required Context aggregator used for pushing messages to the LLM service tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. flow_config Optional[FlowConfig] Optional static flow configuration context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. Methods initialize method Initialize the flow with starting messages. Show Parameters initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises FlowInitializationError If initialization fails set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters node_id str required Identifier for the new node node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails register_action method Register a handler for a custom action type. Show Parameters action_type str required String identifier for the action handler Callable required Async or sync function that handles the action get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns messages List[dict] List of messages in the current context Show Raises FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } Function Properties handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: type str required String identifier for the action handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] Exceptions FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
|
flows_pipecat-flows_3679056b.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-properties
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows Core Types FlowArgs FlowArgs Dict[str, Any] Type alias for function handler arguments. FlowResult FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields status str Optional status field error str Optional error message FlowConfig FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields initial_node str required Starting node identifier nodes Dict[str, NodeConfig] required Map of node names to configurations NodeConfig NodeConfig TypedDict Configuration for a single node in the flow. Show Fields name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False Function Handler Types LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ContextStrategy ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values APPEND str Default strategy. Adds new messages to existing context. RESET str Clears context and starts fresh with new messages. RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ContextStrategyConfig ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields strategy ContextStrategy required The strategy to use for context management summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) FlowsFunctionSchema FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters name str required Name of the function description str required Description of the function’s purpose properties Dict[str, Any] required Dictionary defining properties types and descriptions required List[str] required List of required parameter names handler Optional[FunctionHandler] Function handler to process the function call transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) FlowManager FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters task PipelineTask required Pipeline task for frame queueing llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. context_aggregator Any required Context aggregator used for pushing messages to the LLM service tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. flow_config Optional[FlowConfig] Optional static flow configuration context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. Methods initialize method Initialize the flow with starting messages. Show Parameters initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises FlowInitializationError If initialization fails set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters node_id str required Identifier for the new node node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails register_action method Register a handler for a custom action type. Show Parameters action_type str required String identifier for the action handler Callable required Async or sync function that handles the action get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns messages List[dict] List of messages in the current context Show Raises FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } Function Properties handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: type str required String identifier for the action handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] Exceptions FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
|
flows_pipecat-flows_ac3a5ae7.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-flow-error-3
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows Core Types FlowArgs FlowArgs Dict[str, Any] Type alias for function handler arguments. FlowResult FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields status str Optional status field error str Optional error message FlowConfig FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields initial_node str required Starting node identifier nodes Dict[str, NodeConfig] required Map of node names to configurations NodeConfig NodeConfig TypedDict Configuration for a single node in the flow. Show Fields name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False Function Handler Types LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ContextStrategy ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values APPEND str Default strategy. Adds new messages to existing context. RESET str Clears context and starts fresh with new messages. RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ContextStrategyConfig ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields strategy ContextStrategy required The strategy to use for context management summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) FlowsFunctionSchema FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters name str required Name of the function description str required Description of the function’s purpose properties Dict[str, Any] required Dictionary defining properties types and descriptions required List[str] required List of required parameter names handler Optional[FunctionHandler] Function handler to process the function call transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) FlowManager FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters task PipelineTask required Pipeline task for frame queueing llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. context_aggregator Any required Context aggregator used for pushing messages to the LLM service tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. flow_config Optional[FlowConfig] Optional static flow configuration context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. Methods initialize method Initialize the flow with starting messages. Show Parameters initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises FlowInitializationError If initialization fails set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters node_id str required Identifier for the new node node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails register_action method Register a handler for a custom action type. Show Parameters action_type str required String identifier for the action handler Callable required Async or sync function that handles the action get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns messages List[dict] List of messages in the current context Show Raises FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } Function Properties handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: type str required String identifier for the action handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] Exceptions FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
|
flows_pipecat-flows_b3264db4.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-strategy
|
2 |
+
Title: Pipecat Flows - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows Core Types FlowArgs FlowArgs Dict[str, Any] Type alias for function handler arguments. FlowResult FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields status str Optional status field error str Optional error message FlowConfig FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields initial_node str required Starting node identifier nodes Dict[str, NodeConfig] required Map of node names to configurations NodeConfig NodeConfig TypedDict Configuration for a single node in the flow. Show Fields name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False Function Handler Types LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ContextStrategy ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values APPEND str Default strategy. Adds new messages to existing context. RESET str Clears context and starts fresh with new messages. RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ContextStrategyConfig ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields strategy ContextStrategy required The strategy to use for context management summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) FlowsFunctionSchema FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters name str required Name of the function description str required Description of the function’s purpose properties Dict[str, Any] required Dictionary defining properties types and descriptions required List[str] required List of required parameter names handler Optional[FunctionHandler] Function handler to process the function call transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) FlowManager FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters task PipelineTask required Pipeline task for frame queueing llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. context_aggregator Any required Context aggregator used for pushing messages to the LLM service tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. flow_config Optional[FlowConfig] Optional static flow configuration context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. Methods initialize method Initialize the flow with starting messages. Show Parameters initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises FlowInitializationError If initialization fails set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters node_id str required Identifier for the new node node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises FlowError If node setup fails register_action method Register a handler for a custom action type. Show Parameters action_type str required String identifier for the action handler Callable required Async or sync function that handles the action get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns messages List[dict] List of messages in the current context Show Raises FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } Function Properties handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: type str required String identifier for the action handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] Exceptions FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
|
frame_producer-consumer_293a2488.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/frame/producer-consumer
|
2 |
+
Title: Producer & Consumer Processors - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Producer & Consumer Processors - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Advanced Frame Processors Producer & Consumer Processors Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview The Producer and Consumer processors work as a pair to route frames between different parts of a pipeline, particularly useful when working with ParallelPipeline . They allow you to selectively capture frames from one pipeline branch and inject them into another. ProducerProcessor ProducerProcessor examines frames flowing through the pipeline, applies a filter to decide which frames to share, and optionally transforms these frames before sending them to connected consumers. Constructor Parameters filter Callable[[Frame], Awaitable[bool]] required An async function that determines which frames should be sent to consumers. Should return True for frames to be shared. transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before sending to consumers. By default, passes frames unchanged. passthrough bool default: "True" When True , passes all frames through the normal pipeline flow. When False , only passes through frames that don’t match the filter. ConsumerProcessor ConsumerProcessor receives frames from a ProducerProcessor and injects them into its pipeline branch. Constructor Parameters producer ProducerProcessor required The producer processor that will send frames to this consumer. transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before injecting them into the pipeline. direction FrameDirection default: "FrameDirection.DOWNSTREAM" The direction in which to push received frames. Usually DOWNSTREAM to send frames forward in the pipeline. Usage Examples Basic Usage: Moving TTS Audio Between Branches Copy Ask AI # Create a producer that captures TTS audio frames async def is_tts_audio ( frame : Frame) -> bool : return isinstance (frame, TTSAudioRawFrame) # Define an async transformer function async def tts_to_input_audio_transformer ( frame : Frame) -> Frame: if isinstance (frame, TTSAudioRawFrame): # Convert TTS audio to input audio format return InputAudioRawFrame( audio = frame.audio, sample_rate = frame.sample_rate, num_channels = frame.num_channels ) return frame producer = ProducerProcessor( filter = is_tts_audio, transformer = tts_to_input_audio_transformer passthrough = True # Keep these frames in original pipeline ) # Create a consumer to receive the frames consumer = ConsumerProcessor( producer = producer, direction = FrameDirection. DOWNSTREAM ) # Use in a ParallelPipeline pipeline = Pipeline([ transport.input(), ParallelPipeline( # Branch 1: LLM for bot responses [ llm, tts, producer, # Capture TTS audio here ], # Branch 2: Audio processing branch [ consumer, # Receive TTS audio here llm, # Speech-to-Speech LLM (audio in) ] ), transport.output(), ]) Sentry Metrics UserIdleProcessor On this page Overview ProducerProcessor Constructor Parameters ConsumerProcessor Constructor Parameters Usage Examples Basic Usage: Moving TTS Audio Between Branches Assistant Responses are generated using AI and may contain mistakes.
|
frame_producer-consumer_8615da3b.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/frame/producer-consumer#constructor-parameters
|
2 |
+
Title: Producer & Consumer Processors - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Producer & Consumer Processors - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Advanced Frame Processors Producer & Consumer Processors Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Overview The Producer and Consumer processors work as a pair to route frames between different parts of a pipeline, particularly useful when working with ParallelPipeline . They allow you to selectively capture frames from one pipeline branch and inject them into another. ProducerProcessor ProducerProcessor examines frames flowing through the pipeline, applies a filter to decide which frames to share, and optionally transforms these frames before sending them to connected consumers. Constructor Parameters filter Callable[[Frame], Awaitable[bool]] required An async function that determines which frames should be sent to consumers. Should return True for frames to be shared. transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before sending to consumers. By default, passes frames unchanged. passthrough bool default: "True" When True , passes all frames through the normal pipeline flow. When False , only passes through frames that don’t match the filter. ConsumerProcessor ConsumerProcessor receives frames from a ProducerProcessor and injects them into its pipeline branch. Constructor Parameters producer ProducerProcessor required The producer processor that will send frames to this consumer. transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before injecting them into the pipeline. direction FrameDirection default: "FrameDirection.DOWNSTREAM" The direction in which to push received frames. Usually DOWNSTREAM to send frames forward in the pipeline. Usage Examples Basic Usage: Moving TTS Audio Between Branches Copy Ask AI # Create a producer that captures TTS audio frames async def is_tts_audio ( frame : Frame) -> bool : return isinstance (frame, TTSAudioRawFrame) # Define an async transformer function async def tts_to_input_audio_transformer ( frame : Frame) -> Frame: if isinstance (frame, TTSAudioRawFrame): # Convert TTS audio to input audio format return InputAudioRawFrame( audio = frame.audio, sample_rate = frame.sample_rate, num_channels = frame.num_channels ) return frame producer = ProducerProcessor( filter = is_tts_audio, transformer = tts_to_input_audio_transformer passthrough = True # Keep these frames in original pipeline ) # Create a consumer to receive the frames consumer = ConsumerProcessor( producer = producer, direction = FrameDirection. DOWNSTREAM ) # Use in a ParallelPipeline pipeline = Pipeline([ transport.input(), ParallelPipeline( # Branch 1: LLM for bot responses [ llm, tts, producer, # Capture TTS audio here ], # Branch 2: Audio processing branch [ consumer, # Receive TTS audio here llm, # Speech-to-Speech LLM (audio in) ] ), transport.output(), ]) Sentry Metrics UserIdleProcessor On this page Overview ProducerProcessor Constructor Parameters ConsumerProcessor Constructor Parameters Usage Examples Basic Usage: Moving TTS Audio Between Branches Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_context-management_3438fe83.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/context-management#assistant-context-aggregator
|
2 |
+
Title: Context Management - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Context Management - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Context Management Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal What is Context in Pipecat? In Pipecat, context refers to the text that the LLM uses to perform an inference. Commonly, this is the text inputted to the LLM and outputted from the LLM. The context consists of a list of alternating user/assistant messages that represents the information you want an LLM to respond to. Since Pipecat is a real-time voice (and multimodal) AI framework, the context serves as the collective history of the entire conversation. How Context Updates During Conversations After every user and bot turn in the conversation, processors in the pipeline push frames to update the context: STT Service : Pushes TranscriptionFrame objects that represent what the user says. LLM and TTS Services : Work together to represent what the bot says. The LLM streams tokens (as LLMTextFrame s) to the TTS service, which outputs TTSTextFrame s representing the bot’s spoken words. Setting Up Context Management Pipecat includes a context aggregator class that creates and manages context for both user and assistant messages. Here’s how to set it up: 1. Create the Context and Context Aggregator Copy Ask AI # Create LLM service llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" )) # Create context context = OpenAILLMContext(messages, tools) # Create context aggregator instance context_aggregator = llm.create_context_aggregator(context) The context (which represents the conversation) is passed to the context aggregator. This ensures that both user and assistant instances of the context aggregators have access to the shared conversation context. 2. Add Context Aggregators to Your Pipeline Copy Ask AI pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # User context aggregator llm, tts, transport.output(), context_aggregator.assistant(), # Assistant context aggregator ]) Context Aggregator Placement The placement of context aggregator instances in your pipeline is crucial for proper operation: User Context Aggregator Place the user context aggregator downstream from the STT service . Since the user’s speech results in TranscriptionFrame objects pushed by the STT service, the user aggregator needs to be positioned to collect these frames. Assistant Context Aggregator Place the assistant context aggregator after transport.output() . This positioning is important because: The TTS service outputs spoken words in addition to audio The assistant aggregator must be downstream to collect those frames It ensures context updates happen word-by-word for specific services (e.g. Cartesia, ElevenLabs, and Rime) Your context stays updated at the word level in case an interruption occurs Always place the assistant context aggregator after transport.output() to ensure proper word-level context updates during interruptions. Manually Managing Context You can programmatically add new messages to the context by pushing or queueing specific frames: Adding Messages LLMMessagesAppendFrame : Appends a new message to the existing context LLMMessagesUpdateFrame : Completely replaces the existing context with new context provided in the frame Retrieving Current Context The context aggregator provides a get_context_frame() method to obtain the latest context: Copy Ask AI await task.queue_frames([context_aggregator.user().get_context_frame()]) Triggering Bot Responses You’ll commonly use this manual mechanism—obtaining the current context and pushing/queueing it—to trigger the bot to speak in two scenarios: Starting a pipeline where the bot should speak first After pushing new context frames using LLMMessagesAppendFrame or LLMMessagesUpdateFrame This gives you fine-grained control over when and how the bot responds during the conversation flow. Guides Custom FrameProcessor On this page What is Context in Pipecat? How Context Updates During Conversations Setting Up Context Management 1. Create the Context and Context Aggregator 2. Add Context Aggregators to Your Pipeline Context Aggregator Placement User Context Aggregator Assistant Context Aggregator Manually Managing Context Adding Messages Retrieving Current Context Triggering Bot Responses Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_custom-frame-processor_4a4eb987.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/custom-frame-processor#adding-to-a-pipeline
|
2 |
+
Title: Custom FrameProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Custom FrameProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Custom FrameProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat’s architecture is made up of a Pipeline, FrameProcessors, and Frames. See the Core Concepts for a full review. From that architecture, recall that FrameProcessors are the workers in the pipeline that receive frames and complete actions based on the frames received. Pipecat comes with many FrameProcessors built in. These consist of services, like OpenAILLMService or CartesiaTTSService , utilities, like UserIdleProcessor , and other things. Largely, you can build most of your application with these built-in FrameProcessors, but commonly, your application code may require custom frame processing logic. For example, you may want to perform an action as a result of a frame that’s pushed in the pipeline. Example: ImageSyncAggregator Let’s look at an example custom FrameProcessor that synchronizes images with bot speech: Copy Ask AI class ImageSyncAggregator ( FrameProcessor ): def __init__ ( self , speaking_path : str , waiting_path : str ): super (). __init__ () self ._speaking_image = Image.open(speaking_path) self ._speaking_image_format = self ._speaking_image.format self ._speaking_image_bytes = self ._speaking_image.tobytes() self ._waiting_image = Image.open(waiting_path) self ._waiting_image_format = self ._waiting_image.format self ._waiting_image_bytes = self ._waiting_image.tobytes() async def process_frame ( self , frame : Frame, direction : FrameDirection): await super ().process_frame(frame, direction) if isinstance (frame, BotStartedSpeakingFrame): await self .push_frame( OutputImageRawFrame( image = self ._speaking_image_bytes, size = ( 1024 , 1024 ), format = self ._speaking_image_format, ) ) elif isinstance (frame, BotStoppedSpeakingFrame): await self .push_frame( OutputImageRawFrame( image = self ._waiting_image_bytes, size = ( 1024 , 1024 ), format = self ._waiting_image_format, ) ) await self .push_frame(frame) This example custom FrameProcessor looks for BotStartedSpeakingFrame and BotStoppedSpeakingFrame . When it sees a BotStartedSpeakingFrame , it will show an image that says the bot is speaking. When it sees a BotStoppedSpeakingFrame , it will show an image that says the bot is not speaking. See this working example using the ImageSyncAggregator FrameProcessor Adding to a Pipeline This custom FrameProcessor can be added to a Pipeline just before the transport output: Copy Ask AI # Create and initialize the custom FrameProcessor image_sync_aggregator = ImageSyncAggregator( os.path.join(os.path.dirname( __file__ ), "assets" , "speaking.png" ), os.path.join(os.path.dirname( __file__ ), "assets" , "waiting.png" ), ) pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm, tts, image_sync_aggregator, # Our custom FrameProcessor transport.output(), context_aggregator.assistant(), ] ) With this positioning, the ImageSyncAggregator FrameProcessor will receive the BotStartedSpeakingFrame and BotStoppedSpeakingFrame outputted by the TTS processor and then push its own frame— OutputImageRawFrame —to the output transport. Key Requirements FrameProcessors must inherit from the base FrameProcessor class. This ensures that your custom FrameProcessor will correctly handle frames like StartFrame , EndFrame , StartInterruptionFrame without having to write custom logic for those frames. This inheritance also provides it with the ability to process_frame() and push_frame() : process_frame() is what allows the FrameProcessor to receive frames and add custom conditional logic based on the frames that are received. push_frame() allows the FrameProcessor to push frames to the pipeline. Normally, frames are pushed DOWNSTREAM, but based on which processors need the output, you can also push UPSTREAM or in both directions. Essential Implementation Details To ensure proper base class inheritance, it’s critical to include: super().__init__() in your __init__ method await super().process_frame(frame, direction) in your process_frame() method Copy Ask AI class MyCustomProcessor ( FrameProcessor ): def __init__ ( self , ** kwargs ): super (). __init__ ( ** kwargs) # ✅ Required # Your initialization code here async def process_frame ( self , frame : Frame, direction : FrameDirection): await super ().process_frame(frame, direction) # ✅ Required # Your custom frame processing logic here if isinstance (frame, SomeSpecificFrame): # Handle the frame pass await self .push_frame(frame) # ✅ Required - pass frame through Critical Responsibility: Frame Forwarding FrameProcessors receive all frames that are pushed through the pipeline. This gives them a lot of power, but also a great responsibility. Critically, they must push all frames through the pipeline; if they don’t, they block frames from moving through the Pipeline, which will cause issues in how your application functions. You can see this at work in the ImageSyncAggregator ’s process_frame() method. It handles both bot speaking frames and also has an await self.push_frame(frame) which pushes the frame through to the next processor in the pipeline. Frame Direction When pushing frames, you can specify the direction: Copy Ask AI # Push downstream (default) await self .push_frame(frame) await self .push_frame(frame, FrameDirection. DOWNSTREAM ) # Push upstream await self .push_frame(frame, FrameDirection. UPSTREAM ) Most custom FrameProcessors will push frames downstream, but upstream can be useful for sending control frames or error notifications back up the pipeline. Best Practices Always call the parent methods : Use super().__init__() and await super().process_frame() Forward all frames : Make sure every frame is pushed through with await self.push_frame(frame) Handle frames conditionally : Use isinstance() checks to handle specific frame types Use proper error handling : Wrap risky operations in try/catch blocks Position carefully in pipeline : Consider where in the pipeline your processor needs to be to receive the right frames With these patterns, you can create powerful custom FrameProcessors that extend Pipecat’s capabilities for your specific use case. Context Management Detecting Idle Users On this page Example: ImageSyncAggregator Adding to a Pipeline Key Requirements Essential Implementation Details Critical Responsibility: Frame Forwarding Frame Direction Best Practices Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_detecting-user-idle_3b1dfd0f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/detecting-user-idle#how-it-works
|
2 |
+
Title: Detecting Idle Users - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Detecting Idle Users - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Detecting Idle Users Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Overview In conversational applications, it’s important to handle situations where users go silent or inactive. Pipecat’s UserIdleProcessor helps you detect when users haven’t spoken for a defined period, allowing your bot to respond appropriately. How It Works The UserIdleProcessor monitors user activity and: Starts tracking after the first interaction (user or bot speaking) Resets a timer whenever the user speaks Calls your custom callback function when the user is idle for longer than the timeout period Provides a retry counter to track consecutive idle periods Allows you to implement escalating responses or gracefully end the conversation The processor uses speech events (not audio frames) to detect activity, so it requires an active speech-to-text service or a transport with built-in speech detection. Basic Implementation Step 1: Create a Handler Function First, create a callback function that will be triggered when the user is idle: Copy Ask AI # Simple handler that doesn't use retry count async def handle_user_idle ( processor ): # Send a reminder to the user await processor.push_frame(TTSSpeakFrame( "Are you still there?" )) # OR # Advanced handler with retry logic async def handle_user_idle ( processor , retry_count ): if retry_count == 1 : # First attempt - gentle reminder await processor.push_frame(TTSSpeakFrame( "Are you still there?" )) return True # Continue monitoring elif retry_count == 2 : # Second attempt - more direct prompt await processor.push_frame(TTSSpeakFrame( "Would you like to continue our conversation?" )) return True # Continue monitoring else : # Third attempt - end conversation await processor.push_frame(TTSSpeakFrame( "I'll leave you for now. Have a nice day!" )) await processor.push_frame(EndFrame(), FrameDirection. UPSTREAM ) return False # Stop monitoring Step 2: Create the Idle Processor Initialize the processor with your callback and desired timeout: Copy Ask AI from pipecat.processors.user_idle_processor import UserIdleProcessor user_idle = UserIdleProcessor( callback = handle_user_idle, # Your callback function timeout = 5.0 , # Seconds of inactivity before triggering ) Step 3: Add to Your Pipeline Place the processor after speech detection but before context handling: Copy Ask AI pipeline = Pipeline( [ transport.input(), stt, # Speech-to-text user_idle, # Add idle detection here context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ] ) Best Practices Set appropriate timeouts : Shorter timeouts (5-10 seconds) work well for voice conversations Use escalating responses : Start with gentle reminders and gradually become more direct Limit retry attempts : After 2-3 unsuccessful attempts, consider ending the conversation gracefully by pushing an EndFrame Next Steps Try the User Idle Example Explore a complete working example that demonstrates how to detect and respond to user inactivity in Pipecat. UserIdleProcessor Reference Read the complete API reference documentation for advanced configuration options and callback patterns. Implementing idle user detection improves the conversational experience by ensuring your bot can handle periods of user inactivity gracefully, either by prompting for re-engagement or politely ending the conversation when appropriate. Custom FrameProcessor Ending a Pipeline On this page Overview How It Works Basic Implementation Step 1: Create a Handler Function Step 2: Create the Idle Processor Step 3: Add to Your Pipeline Best Practices Next Steps Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_end-pipeline_2929a644.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/end-pipeline#overview
|
2 |
+
Title: Ending a Pipeline - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Ending a Pipeline - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Ending a Pipeline Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Overview Properly ending a Pipecat pipeline is essential to prevent hanging processes and ensure clean shutdown of your session and related infrastructure. This guide covers different approaches to pipeline termination and provides best practices for each scenario. Shutdown Approaches Pipecat provides two primary methods for shutting down a pipeline: Graceful Shutdown : Allows completion of in-progress processing before termination Immediate Shutdown : Cancels all tasks immediately Each approach is designed for different use cases, as detailed below. Graceful Shutdown A graceful shutdown is ideal when you want the bot to properly end a conversation. For example, you might want to terminate a session after the bot has completed a specific task or reached a natural conclusion. This approach ensures that any final messages from the bot are processed and delivered before the pipeline terminates. Implementation To implement a graceful shutdown, there are two options: Push an EndFrame from outside your pipeline using the pipeline task: Copy Ask AI await task.queue_frame(EndFrame()) Push an EndTaskFrame upstream from inside your pipeline. For example, inside a function call: Copy Ask AI async def end_conversation ( params : FunctionCallParams): await params.llm.push_frame(TTSSpeakFrame( "Have a nice day!" )) # Signal that the task should end after processing this frame await params.llm.push_frame(EndTaskFrame(), FrameDirection. UPSTREAM ) How Graceful Shutdown Works In both cases, an EndFrame is pushed downstream from the beginning of the pipeline: EndFrame s are queued, so they’ll process after any pending frames (like goodbye messages) All processors in the pipeline will shutdown when processing the EndFrame Once the EndFrame reaches the sink of the PipelineTask , the Pipeline is ready to shut down The Pipecat processor terminates and related resources are released Graceful shutdowns allow your bot to say goodbye and complete any final actions before terminating. Immediate Shutdown An immediate shutdown is appropriate when the human participant is no longer active in the conversation. For example: In a client/server app, when the user closes the browser tab or ends the session In a phone call, when the user hangs up When an error occurs that requires immediate termination In these scenarios, there’s no value in having the bot complete its current turn. Implementation To implement an immediate shutdown, you can use event handlers to, for example, detect disconnections and then push a CancelFrame : Copy Ask AI @transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , client ): logger.info( f "Client closed connection" ) await task.cancel() How Immediate Shutdown Works An event triggers the cancellation (like a client disconnection) task.cancel() is called, which pushes a CancelFrame downstream from the PipelineTask CancelFrame s are SystemFrame s and are not queued Processors that handle the CancelFrame immediate shutdown and push the frame downstream Once the CancelFrame reaches the sink of the PipelineTask , the Pipeline is ready to shut down Immediate shutdowns will discard any pending frames in the pipeline. Use this approach when completing the conversation is no longer necessary. Pipeline Idle Detection In addition to the two explicit shutdown mechanisms, Pipecat includes a backup mechanism to prevent hanging pipelines—Pipeline Idle Detection. This feature monitors activity in your pipeline and can automatically cancel tasks when no meaningful bot interactions are occurring for an extended period. It serves as a safety net to conditionally terminate the pipeline if anomalous behavior occurs. Pipeline Idle Detection is enabled by default and helps prevent resources from being wasted on inactive conversations. For more information on configuring and customizing this feature, see the Pipeline Idle Detection documentation. Best Practices Use graceful shutdowns when you want to let the bot complete its conversation Use immediate shutdowns when the human participant has already disconnected Implement error handling to ensure pipelines can terminate even when exceptions occur Configure idle detection timeouts appropriate for your use case By following these practices, you’ll ensure that your Pipecat pipelines terminate properly and efficiently, preventing resource leaks and improving overall system reliability. Detecting Idle Users Function Calling On this page Overview Shutdown Approaches Graceful Shutdown Implementation How Graceful Shutdown Works Immediate Shutdown Implementation How Immediate Shutdown Works Pipeline Idle Detection Best Practices Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_function-calling_053f634a.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/function-calling#param-result-callback
|
2 |
+
Title: Function Calling - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Function Calling - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Function Calling Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Understanding Function Calling Function calling (also known as tool calling) allows LLMs to request information from external services and APIs. This enables your bot to access real-time data and perform actions that aren’t part of its training data. For example, you could give your bot the ability to: Check current weather conditions Look up stock prices Query a database Control smart home devices Schedule appointments Here’s how it works: You define functions the LLM can use and register them to the LLM service used in your pipeline When needed, the LLM requests a function call Your application executes any corresponding functions The result is sent back to the LLM The LLM uses this information in its response Implementation 1. Define Functions Pipecat provides a standardized FunctionSchema that works across all supported LLM providers. This makes it easy to define functions once and use them with any provider. As a shorthand, you could also bypass specifying a function configuration at all and instead use “direct” functions. Under the hood, these are converted to FunctionSchema s. Using the Standard Schema (Recommended) Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Define a function using the standard schema weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather in a location" , properties = { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, required = [ "location" , "format" ] ) # Create a tools schema with your functions tools = ToolsSchema( standard_tools = [weather_function]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) The ToolsSchema will be automatically converted to the correct format for your LLM provider through adapters. Using Direct Functions (Shorthand) You can bypass specifying a function configuration (as a FunctionSchema or in a provider-specific format) and instead pass the function directly to your ToolsSchema . Pipecat will auto-configure the function, gathering relevant metadata from its signature and docstring. Metadata includes: name description properties (including individual property descriptions) list of required properties Note that the function signature is a bit different when using direct functions. The first parameter is FunctionCallParams , followed by any others necessary for the function. Copy Ask AI from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Define a direct function async def get_current_weather ( params : FunctionCallParams, location : str , format : str ): """Get the current weather. Args: location: The city and state, e.g. "San Francisco, CA". format: The temperature unit to use. Must be either "celsius" or "fahrenheit". """ weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Create a tools schema, passing your function directly to it tools = ToolsSchema( standard_tools = [get_current_weather]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) Using Provider-Specific Formats (Alternative) You can also define functions in the provider-specific format if needed: OpenAI Anthropic Gemini Copy Ask AI from openai.types.chat import ChatCompletionToolParam # OpenAI native format tools = [ ChatCompletionToolParam( type = "function" , function = { "name" : "get_current_weather" , "description" : "Get the current weather" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, "required" : [ "location" , "format" ], }, }, ) ] Provider-Specific Custom Tools Some providers support unique tools that don’t fit the standard function schema. For these cases, you can add custom tools: Copy Ask AI from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema # Standard functions weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather" , properties = { "location" : { "type" : "string" }}, required = [ "location" ] ) # Custom Gemini search tool gemini_search_tool = { "web_search" : { "description" : "Search the web for information" } } # Create a tools schema with both standard and custom tools tools = ToolsSchema( standard_tools = [weather_function], custom_tools = { AdapterType. GEMINI : [gemini_search_tool] } ) See the provider-specific documentation for details on custom tools and their formats. 2. Register Function Handlers Register handlers for your functions using one of these LLM service methods : register_function register_direct_function Which one you use depends on whether your function is a “direct” function . Non-Direct Function Direct Function Copy Ask AI from pipecat.services.llm_service import FunctionCallParams llm = OpenAILLMService( api_key = "your-api-key" ) # Main function handler - called to execute the function async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data from your API weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Register the function llm.register_function( "get_current_weather" , fetch_weather_from_api, ) 3. Create the Pipeline Include your LLM service in your pipeline with the registered functions: Copy Ask AI # Initialize the LLM context with your function schemas context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) # Create the context aggregator to collect the user and assistant context context_aggregator = llm.create_context_aggregator(context) # Create the pipeline pipeline = Pipeline([ transport.input(), # Input from the transport stt, # STT processing context_aggregator.user(), # User context aggregation llm, # LLM processing tts, # TTS processing transport.output(), # Output to the transport context_aggregator.assistant(), # Assistant context aggregation ]) Function Handler Details FunctionCallParams The FunctionCallParams object contains all the information needed for handling function calls: params : FunctionCallParams function_name : Name of the called function arguments : Arguments passed by the LLM tool_call_id : Unique identifier for the function call llm : Reference to the LLM service context : Current conversation context result_callback : Async function to return results function_name str Name of the function being called tool_call_id str Unique identifier for the function call arguments Mapping[str, Any] Arguments passed by the LLM to the function llm LLMService Reference to the LLM service that initiated the call context OpenAILLMContext Current conversation context result_callback FunctionCallResultCallback Async callback function to return results Handler Structure Your function handler should: Receive necessary arguments, either: From params.arguments Directly From function arguments, if using direct functions Process data or call external services Return results via params.result_callback(result) Non-Direct Function Direct Function Copy Ask AI async def fetch_weather_from_api ( params : FunctionCallParams): try : # Extract arguments location = params.arguments.get( "location" ) format_type = params.arguments.get( "format" , "celsius" ) # Call external API api_result = await weather_api.get_weather(location, format_type) # Return formatted result await params.result_callback({ "location" : location, "temperature" : api_result[ "temp" ], "conditions" : api_result[ "conditions" ], "unit" : format_type }) except Exception as e: # Handle errors await params.result_callback({ "error" : f "Failed to get weather: { str (e) } " }) Controlling Function Call Behavior (Advanced) When returning results from a function handler, you can control how the LLM processes those results using a FunctionCallResultProperties object passed to the result callback. It can be handy to skip a completion when you have back-to-back function calls. Note, if you skip a completion, you must manually trigger one from the context. Properties run_llm Optional[bool] Controls whether the LLM should generate a response after the function call: True : Run LLM after function call (default if no other function calls in progress) False : Don’t run LLM after function call None : Use default behavior on_context_updated Optional[Callable[[], Awaitable[None]]] Optional callback that runs after the function result is added to the context Example Usage Copy Ask AI from pipecat.frames.frames import FunctionCallResultProperties from pipecat.services.llm_service import FunctionCallParams async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data weather_data = { "conditions" : "sunny" , "temperature" : "75" } # Don't run LLM after this function call properties = FunctionCallResultProperties( run_llm = False ) await params.result_callback(weather_data, properties = properties) async def query_database ( params : FunctionCallParams): # Query database results = await db.query(params.arguments[ "query" ]) async def on_update (): await notify_system( "Database query complete" ) # Run LLM after function call and notify when context is updated properties = FunctionCallResultProperties( run_llm = True , on_context_updated = on_update ) await params.result_callback(results, properties = properties) Next steps Check out the function calling examples to see a complete example for specific LLM providers. Refer to your LLM provider’s documentation to learn more about their function calling capabilities. Ending a Pipeline Muting User Input On this page Understanding Function Calling Implementation 1. Define Functions Using the Standard Schema (Recommended) Using Direct Functions (Shorthand) Using Provider-Specific Formats (Alternative) Provider-Specific Custom Tools 2. Register Function Handlers 3. Create the Pipeline Function Handler Details FunctionCallParams Handler Structure Controlling Function Call Behavior (Advanced) Properties Example Usage Next steps Assistant Responses are generated using AI and may contain mistakes.
|
fundamentals_function-calling_1b56ceae.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/guides/fundamentals/function-calling#using-direct-functions-shorthand
|
2 |
+
Title: Function Calling - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Function Calling - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Function Calling Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Understanding Function Calling Function calling (also known as tool calling) allows LLMs to request information from external services and APIs. This enables your bot to access real-time data and perform actions that aren’t part of its training data. For example, you could give your bot the ability to: Check current weather conditions Look up stock prices Query a database Control smart home devices Schedule appointments Here’s how it works: You define functions the LLM can use and register them to the LLM service used in your pipeline When needed, the LLM requests a function call Your application executes any corresponding functions The result is sent back to the LLM The LLM uses this information in its response Implementation 1. Define Functions Pipecat provides a standardized FunctionSchema that works across all supported LLM providers. This makes it easy to define functions once and use them with any provider. As a shorthand, you could also bypass specifying a function configuration at all and instead use “direct” functions. Under the hood, these are converted to FunctionSchema s. Using the Standard Schema (Recommended) Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Define a function using the standard schema weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather in a location" , properties = { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, required = [ "location" , "format" ] ) # Create a tools schema with your functions tools = ToolsSchema( standard_tools = [weather_function]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) The ToolsSchema will be automatically converted to the correct format for your LLM provider through adapters. Using Direct Functions (Shorthand) You can bypass specifying a function configuration (as a FunctionSchema or in a provider-specific format) and instead pass the function directly to your ToolsSchema . Pipecat will auto-configure the function, gathering relevant metadata from its signature and docstring. Metadata includes: name description properties (including individual property descriptions) list of required properties Note that the function signature is a bit different when using direct functions. The first parameter is FunctionCallParams , followed by any others necessary for the function. Copy Ask AI from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Define a direct function async def get_current_weather ( params : FunctionCallParams, location : str , format : str ): """Get the current weather. Args: location: The city and state, e.g. "San Francisco, CA". format: The temperature unit to use. Must be either "celsius" or "fahrenheit". """ weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Create a tools schema, passing your function directly to it tools = ToolsSchema( standard_tools = [get_current_weather]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) Using Provider-Specific Formats (Alternative) You can also define functions in the provider-specific format if needed: OpenAI Anthropic Gemini Copy Ask AI from openai.types.chat import ChatCompletionToolParam # OpenAI native format tools = [ ChatCompletionToolParam( type = "function" , function = { "name" : "get_current_weather" , "description" : "Get the current weather" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, "required" : [ "location" , "format" ], }, }, ) ] Provider-Specific Custom Tools Some providers support unique tools that don’t fit the standard function schema. For these cases, you can add custom tools: Copy Ask AI from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema # Standard functions weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather" , properties = { "location" : { "type" : "string" }}, required = [ "location" ] ) # Custom Gemini search tool gemini_search_tool = { "web_search" : { "description" : "Search the web for information" } } # Create a tools schema with both standard and custom tools tools = ToolsSchema( standard_tools = [weather_function], custom_tools = { AdapterType. GEMINI : [gemini_search_tool] } ) See the provider-specific documentation for details on custom tools and their formats. 2. Register Function Handlers Register handlers for your functions using one of these LLM service methods : register_function register_direct_function Which one you use depends on whether your function is a “direct” function . Non-Direct Function Direct Function Copy Ask AI from pipecat.services.llm_service import FunctionCallParams llm = OpenAILLMService( api_key = "your-api-key" ) # Main function handler - called to execute the function async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data from your API weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Register the function llm.register_function( "get_current_weather" , fetch_weather_from_api, ) 3. Create the Pipeline Include your LLM service in your pipeline with the registered functions: Copy Ask AI # Initialize the LLM context with your function schemas context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) # Create the context aggregator to collect the user and assistant context context_aggregator = llm.create_context_aggregator(context) # Create the pipeline pipeline = Pipeline([ transport.input(), # Input from the transport stt, # STT processing context_aggregator.user(), # User context aggregation llm, # LLM processing tts, # TTS processing transport.output(), # Output to the transport context_aggregator.assistant(), # Assistant context aggregation ]) Function Handler Details FunctionCallParams The FunctionCallParams object contains all the information needed for handling function calls: params : FunctionCallParams function_name : Name of the called function arguments : Arguments passed by the LLM tool_call_id : Unique identifier for the function call llm : Reference to the LLM service context : Current conversation context result_callback : Async function to return results function_name str Name of the function being called tool_call_id str Unique identifier for the function call arguments Mapping[str, Any] Arguments passed by the LLM to the function llm LLMService Reference to the LLM service that initiated the call context OpenAILLMContext Current conversation context result_callback FunctionCallResultCallback Async callback function to return results Handler Structure Your function handler should: Receive necessary arguments, either: From params.arguments Directly From function arguments, if using direct functions Process data or call external services Return results via params.result_callback(result) Non-Direct Function Direct Function Copy Ask AI async def fetch_weather_from_api ( params : FunctionCallParams): try : # Extract arguments location = params.arguments.get( "location" ) format_type = params.arguments.get( "format" , "celsius" ) # Call external API api_result = await weather_api.get_weather(location, format_type) # Return formatted result await params.result_callback({ "location" : location, "temperature" : api_result[ "temp" ], "conditions" : api_result[ "conditions" ], "unit" : format_type }) except Exception as e: # Handle errors await params.result_callback({ "error" : f "Failed to get weather: { str (e) } " }) Controlling Function Call Behavior (Advanced) When returning results from a function handler, you can control how the LLM processes those results using a FunctionCallResultProperties object passed to the result callback. It can be handy to skip a completion when you have back-to-back function calls. Note, if you skip a completion, you must manually trigger one from the context. Properties run_llm Optional[bool] Controls whether the LLM should generate a response after the function call: True : Run LLM after function call (default if no other function calls in progress) False : Don’t run LLM after function call None : Use default behavior on_context_updated Optional[Callable[[], Awaitable[None]]] Optional callback that runs after the function result is added to the context Example Usage Copy Ask AI from pipecat.frames.frames import FunctionCallResultProperties from pipecat.services.llm_service import FunctionCallParams async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data weather_data = { "conditions" : "sunny" , "temperature" : "75" } # Don't run LLM after this function call properties = FunctionCallResultProperties( run_llm = False ) await params.result_callback(weather_data, properties = properties) async def query_database ( params : FunctionCallParams): # Query database results = await db.query(params.arguments[ "query" ]) async def on_update (): await notify_system( "Database query complete" ) # Run LLM after function call and notify when context is updated properties = FunctionCallResultProperties( run_llm = True , on_context_updated = on_update ) await params.result_callback(results, properties = properties) Next steps Check out the function calling examples to see a complete example for specific LLM providers. Refer to your LLM provider’s documentation to learn more about their function calling capabilities. Ending a Pipeline Muting User Input On this page Understanding Function Calling Implementation 1. Define Functions Using the Standard Schema (Recommended) Using Direct Functions (Shorthand) Using Provider-Specific Formats (Alternative) Provider-Specific Custom Tools 2. Register Function Handlers 3. Create the Pipeline Function Handler Details FunctionCallParams Handler Structure Controlling Function Call Behavior (Advanced) Properties Example Usage Next steps Assistant Responses are generated using AI and may contain mistakes.
|