diff --git a/analytics_sentry_239efe45.txt b/analytics_sentry_239efe45.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3bcc7505a57eaf5fbca37851adf929b900248732
--- /dev/null
+++ b/analytics_sentry_239efe45.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/analytics/sentry#usage-example
+Title: Sentry Metrics - Pipecat
+==================================================
+
+Sentry Metrics - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Analytics & Monitoring Sentry Metrics Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Sentry Metrics Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SentryMetrics extends FrameProcessorMetrics to provide performance monitoring integration with Sentry. It tracks Time to First Byte (TTFB) and processing duration metrics for frame processors. ​ Installation To use Sentry metrics, install the Sentry SDK: Copy Ask AI pip install "pipecat-ai[sentry]" ​ Configuration Sentry must be initialized in your application before metrics will be collected: Copy Ask AI import sentry_sdk sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) ​ Usage Example Copy Ask AI import sentry_sdk from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.metrics.sentry import SentryMetrics from pipecat.transports.services.daily import DailyParams, DailyTransport async def create_metrics_pipeline (): sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_out_enabled = True , audio_in_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ), ) tts = ElevenLabsTTSService( api_key = os.getenv( "ELEVENLABS_API_KEY" ), metrics = SentryMetrics(), ) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" ), model = "gpt-4o" ), metrics = SentryMetrics(), ) messages = [ { "role" : "system" , "content" : "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself. Keep all your responses to 12 words or fewer." , }, ] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) ​ Transaction Information Each transaction includes: Operation type ( ttfb or processing ) Description with processor name Start timestamp End timestamp Unique transaction ID ​ Fallback Behavior If Sentry is not available (not installed or not initialized): Warning logs are generated Metric methods execute without error No data is sent to Sentry ​ Notes Requires Sentry SDK to be installed and initialized Thread-safe metric collection Automatic transaction management Supports selective TTFB reporting Integrates with Sentry’s performance monitoring Provides detailed timing information Maintains timing data even when Sentry is unavailable Moondream Producer & Consumer Processors On this page Overview Installation Configuration Usage Example Transaction Information Fallback Behavior Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/android_introduction_8b47f54f.txt b/android_introduction_8b47f54f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..64710eb7c312de28e179128750f09ab730e95663
--- /dev/null
+++ b/android_introduction_8b47f54f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/android/introduction
+Title: SDK Introduction - Pipecat
+==================================================
+
+SDK Introduction - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Android SDK SDK Introduction Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat Android SDK provides a Kotlin implementation for building voice and multimodal AI applications on Android. It handles: Real-time audio and video streaming Bot communication and state management Media device handling Configuration management Event handling ​ Installation Add the dependency for your chosen transport to your build.gradle file. For example, to use the Daily transport: Copy Ask AI implementation "ai.pipecat:daily-transport:0.3.3" ​ Example Here’s a simple example using Daily as the transport layer. Note that the clientConfig is optional and depends on what is required by the bot backend. Copy Ask AI val clientConfig = listOf ( ServiceConfig ( service = "llm" , options = listOf ( Option ( "model" , "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo" ), Option ( "messages" , Value. Array ( Value. Object ( "role" to Value. Str ( "system" ), "content" to Value. Str ( "You are a helpful assistant." ) ) )) ) ), ServiceConfig ( service = "tts" , options = listOf ( Option ( "voice" , "79a125e8-cd45-4c13-8a67-188112f4dd22" ) ) ) ) val callbacks = object : RTVIEventCallbacks () { override fun onBackendError (message: String ) { Log. e (TAG, "Error from backend: $message " ) } } val options = RTVIClientOptions ( services = listOf ( ServiceRegistration ( "llm" , "together" ), ServiceRegistration ( "tts" , "cartesia" )), params = RTVIClientParams (baseUrl = "<your API url>" , config = clientConfig) ) val client = RTVIClient (DailyTransport. Factory (context), callbacks, options) client. connect (). await () // Using Coroutines // Or using callbacks: // client.start().withCallback { /* handle completion */ } ​ Documentation API Reference Complete SDK API documentation Daily Transport WebRTC implementation using Daily OpenAIRealTimeWebRTCTransport API Reference On this page Installation Example Documentation Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/audio_audio-buffer-processor_18595c52.txt b/audio_audio-buffer-processor_18595c52.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ca87af10342fd8beaf10cc9ff5a19a1e7c6bc260
--- /dev/null
+++ b/audio_audio-buffer-processor_18595c52.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#event-handlers
+Title: AudioBufferProcessor - Pipecat
+==================================================
+
+AudioBufferProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. ​ Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) ​ Parameters ​ sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transport’s sample rate from the StartFrame . ​ num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) ​ buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) ​ enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). ​ Properties ​ sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. ​ num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). ​ Methods ​ start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. ​ stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. ​ has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. ​ Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. ​ on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) ​ on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks ​ on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the user’s speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) ​ on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the bot’s speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) ​ Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True ​ Integration Notes ​ STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. ​ Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/audio_audio-buffer-processor_9325cae0.txt b/audio_audio-buffer-processor_9325cae0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c25b1a836046d95d90ce2615159fda92b994e5f1
--- /dev/null
+++ b/audio_audio-buffer-processor_9325cae0.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#has-audio
+Title: AudioBufferProcessor - Pipecat
+==================================================
+
+AudioBufferProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. ​ Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) ​ Parameters ​ sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transport’s sample rate from the StartFrame . ​ num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) ​ buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) ​ enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). ​ Properties ​ sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. ​ num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). ​ Methods ​ start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. ​ stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. ​ has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. ​ Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. ​ on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) ​ on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks ​ on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the user’s speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) ​ on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the bot’s speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) ​ Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True ​ Integration Notes ​ STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. ​ Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/audio_audio-buffer-processor_f58d7ef4.txt b/audio_audio-buffer-processor_f58d7ef4.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f2963b0c4224ee1cbbd5c61bdb8209e316f2c315
--- /dev/null
+++ b/audio_audio-buffer-processor_f58d7ef4.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#properties
+Title: AudioBufferProcessor - Pipecat
+==================================================
+
+AudioBufferProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. ​ Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) ​ Parameters ​ sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transport’s sample rate from the StartFrame . ​ num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) ​ buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) ​ enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). ​ Properties ​ sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. ​ num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). ​ Methods ​ start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. ​ stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. ​ has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. ​ Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. ​ on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) ​ on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks ​ on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the user’s speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) ​ on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the bot’s speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) ​ Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True ​ Integration Notes ​ STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. ​ Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/audio_koala-filter_1d2eb782.txt b/audio_koala-filter_1d2eb782.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ca93839945fccedfbfdba685fd6056a379e2161c
--- /dev/null
+++ b/audio_koala-filter_1d2eb782.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/audio/koala-filter#input-frames
+Title: KoalaFilter - Pipecat
+==================================================
+
+KoalaFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing KoalaFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview KoalaFilter is an audio processor that reduces background noise in real-time audio streams using Koala Noise Suppression technology from Picovoice. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Koala, you need a Picovoice access key. Get started at Picovoice Console . ​ Installation The Koala filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[koala]" You’ll also need to set up your Koala access key as an environment variable: KOALA_ACCESS_KEY ​ Constructor Parameters ​ access_key str required Picovoice access key for using the Koala noise suppression service ​ Input Frames ​ FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) ​ Usage Example Copy Ask AI from pipecat.audio.filters.koala_filter import KoalaFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KoalaFilter( access_key = os.getenv( "KOALA_ACCESS_KEY" )), # Enable Koala noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) ​ Audio Flow ​ Notes Requires Picovoice access key Supports real-time audio processing Handles 16-bit PCM audio format Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency Automatically handles audio frame buffering Sample rate must match Koala’s required sample rate AudioBufferProcessor KrispFilter On this page Overview Installation Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/audio_koala-filter_3ef43cec.txt b/audio_koala-filter_3ef43cec.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c606115c5b8161d92c5f56f626e3de9b3066d080
--- /dev/null
+++ b/audio_koala-filter_3ef43cec.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/audio/koala-filter#usage-example
+Title: KoalaFilter - Pipecat
+==================================================
+
+KoalaFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Audio Processing KoalaFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview KoalaFilter is an audio processor that reduces background noise in real-time audio streams using Koala Noise Suppression technology from Picovoice. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Koala, you need a Picovoice access key. Get started at Picovoice Console . ​ Installation The Koala filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[koala]" You’ll also need to set up your Koala access key as an environment variable: KOALA_ACCESS_KEY ​ Constructor Parameters ​ access_key str required Picovoice access key for using the Koala noise suppression service ​ Input Frames ​ FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) ​ Usage Example Copy Ask AI from pipecat.audio.filters.koala_filter import KoalaFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KoalaFilter( access_key = os.getenv( "KOALA_ACCESS_KEY" )), # Enable Koala noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) ​ Audio Flow ​ Notes Requires Picovoice access key Supports real-time audio processing Handles 16-bit PCM audio format Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency Automatically handles audio frame buffering Sample rate must match Koala’s required sample rate AudioBufferProcessor KrispFilter On this page Overview Installation Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/client_introduction_a82cc575.txt b/client_introduction_a82cc575.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5d0ab30f07fd5c97c126c634044338088cf9c144
--- /dev/null
+++ b/client_introduction_a82cc575.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/introduction#core-functionality
+Title: Client SDKs - Pipecat
+==================================================
+
+Client SDKs - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Client SDKs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Client SDKs are currently in transition to a new, simpler API design. The js and react libraries have already been deployed with these changes. Their corresponding documentation along with this top-level documentation has been updated to reflect the latest changes. For transitioning to the new API, please refer to the migration guide . Note that React Native, iOS, and Android SDKs are still in the process of being updated and their documentation will be updated once the new versions are released. If you have any questions or need assistance, please reach out to us on Discord . Pipecat provides client SDKs for multiple platforms, all implementing the RTVI (Real-Time Voice and Video Inference) standard. These SDKs make it easy to build real-time AI applications that can handle voice, video, and text interactions. Javascript Pipecat JS SDK React Pipecat React SDK React Native Pipecat React Native SDK Swift Pipecat iOS SDK Kotlin Pipecat Android SDK C++ Pipecat C++ SDK ​ Core Functionality All Pipecat client SDKs provide: Media Management Handle device inputs and media streams for audio and video Bot Integration Configure and communicate with your Pipecat bot Session Management Manage connection state and error handling ​ Core Types ​ PipecatClient The main class for interacting with Pipecat bots. It is the primary type you will interact with. ​ Transport The PipecatClient wraps a Transport, which defines and provides the underlying connection mechanism (e.g., WebSocket, WebRTC). Your Pipecat pipeline will contain a corresponding transport. ​ RTVIMessage Represents a message sent to or received from a Pipecat bot. ​ Simple Usage Examples Connecting to a Bot Custom Messaging Establish ongoing connections via WebSocket or WebRTC for: Live voice conversations Real-time video processing Continuous interactions javascript react Copy Ask AI // Example: Establishing a real-time connection import { RTVIEvent , RTVIMessage , PipecatClient } from "@pipecat-ai/client-js" ; import { DailyTransport } from "@pipecat-ai/daily-transport" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , enableCam: false , enableScreenShare: false , callbacks: { onBotConnected : () => { console . log ( "[CALLBACK] Bot connected" ); }, onBotDisconnected : () => { console . log ( "[CALLBACK] Bot disconnected" ); }, onBotReady : () => { console . log ( "[CALLBACK] Bot ready to chat!" ); }, }, }); try { // Below, we use a REST endpoint to fetch connection credentials for our // Daily Transport. Alternatively, you could provide those credentials // directly to `connect()`. await pcClient . connect ({ endpoint: "https://your-connect-end-point-here/connect" , }); } catch ( e ) { console . error ( e . message ); } // Events (alternative approach to constructor-provided callbacks) pcClient . on ( RTVIEvent . Connected , () => { console . log ( "[EVENT] User connected" ); }); pcClient . on ( RTVIEvent . Disconnected , () => { console . log ( "[EVENT] User disconnected" ); }); Establish ongoing connections via WebSocket or WebRTC for: Live voice conversations Real-time video processing Continuous interactions javascript react Copy Ask AI // Example: Establishing a real-time connection import { RTVIEvent , RTVIMessage , PipecatClient } from "@pipecat-ai/client-js" ; import { DailyTransport } from "@pipecat-ai/daily-transport" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , enableCam: false , enableScreenShare: false , callbacks: { onBotConnected : () => { console . log ( "[CALLBACK] Bot connected" ); }, onBotDisconnected : () => { console . log ( "[CALLBACK] Bot disconnected" ); }, onBotReady : () => { console . log ( "[CALLBACK] Bot ready to chat!" ); }, }, }); try { // Below, we use a REST endpoint to fetch connection credentials for our // Daily Transport. Alternatively, you could provide those credentials // directly to `connect()`. await pcClient . connect ({ endpoint: "https://your-connect-end-point-here/connect" , }); } catch ( e ) { console . error ( e . message ); } // Events (alternative approach to constructor-provided callbacks) pcClient . on ( RTVIEvent . Connected , () => { console . log ( "[EVENT] User connected" ); }); pcClient . on ( RTVIEvent . Disconnected , () => { console . log ( "[EVENT] User disconnected" ); }); Send custom messages and handle responses from your bot. This is useful for: Running server-side functionality Triggering specific bot actions Querying the server Responding to server requests javascript react Copy Ask AI import { PipecatClient } from "@pipecat-ai/client-js" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), callbacks: { onBotConnected : () => { pcClient . sendClientRequest ( 'get-language' ) . then (( response ) => { console . log ( "[CALLBACK] Bot using language:" , response ); if ( response !== preferredLanguage ) { pcClient . sendClientMessage ( 'set-language' , { language: preferredLanguage }); } }) . catch (( error ) => { console . error ( "[CALLBACK] Error getting language:" , error ); }); }, onServerMessage : ( message ) => { console . log ( "[CALLBACK] Received message from server:" , message ); }, }, }); await pcClient . connect ({ url: "https://your-daily-room-url" , token: "your-daily-token" }); ​ About RTVI Pipecat’s client SDKs implement the RTVI (Real-Time Voice and Video Inference) standard, an open specification for real-time AI inference. This means: Your code can work with any RTVI-compatible inference service You get battle-tested tooling for real-time multimedia handling You can easily set up development and testing environments ​ Next Steps Get started by trying out examples: Simple Chatbot Example Complete client-server example with both bot backend (Python) and frontend implementation (JS, React, React Native, iOS, and Android). More Examples Explore our full collection of example applications and implementations across different platforms and use cases. The RTVI Standard On this page Core Functionality Core Types PipecatClient Transport RTVIMessage Simple Usage Examples About RTVI Next Steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/client_migration-guide_87a52ab8.txt b/client_migration-guide_87a52ab8.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7aa927404ffff5379ca3d2ecd46eebe6244fab77
--- /dev/null
+++ b/client_migration-guide_87a52ab8.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/migration-guide
+Title: RTVIClient Migration Guide - Pipecat
+==================================================
+
+RTVIClient Migration Guide - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation RTVIClient Migration Guide Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport This guide will cover the high-level changes between the old RTVIClient and the new PipecatClient . For specific code updates, refer to the platform-specific migration guides. ​ Key changes Client Name : The class name has changed from RTVIClient to PipecatClient . Pipeline Connection : Previously, the client expected a REST endpoint for gathering connection information as part of the constructor and was difficult to update or bipass. The new client expects connection information to be provided directly as part of the connect() method and can either be provided as an object with details your Transport requires or as an object with REST endpoint details for acquiring them. Actions and helpers : These have gone away in favor of some built-in methods for doing common actions like function call handling and appending to the llm context or in the case of custom actions, a simple set of methods for sending messages to the bot and handling responses. See registerFunctionCallHandler() , appendToContext() , sendClientMessage() , and sendClientRequest() for more details. Bot Configuration : This functionality as been removed as a security measure, so that a client cannot inherently have the ability to override a bot configuration and use credentials to its own whims. If you need the client to initialize or update the bot configuration, you will need to do so through an API call to your backend or building on top of the client-server messaging, which has now been made easier. The Client SDKs are currently in the process of making these changes. At this time, only the JavaScript and React libraries have been updated and released. Their corresponding documentation along with this top-level documentation has been updated to reflect the latest changes. The React Native, iOS, and Android SDKs are still in the process of being updated and their documentation will be updated and a migration guide provided once the new versions are released. If you have any questions or need assistance, please reach out to us on Discord . ​ Migration guides JavaScript Migrate your JavaScript client code to the new PipecatClient React Update your React components to use the new PipecatClient The RTVI Standard SDK Introduction On this page Key changes Migration guides Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/client_rtvi-standard_54977b15.txt b/client_rtvi-standard_54977b15.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4ed2d31acc1eae29dc8916a2cb84e330d2f139bb
--- /dev/null
+++ b/client_rtvi-standard_54977b15.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/rtvi-standard#user-started-speaking-%F0%9F%A4%96
+Title: The RTVI Standard - Pipecat
+==================================================
+
+The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. ​ Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. ​ Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. ​ RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } ​ id string A unique identifier for the message, used to correlate requests and responses. ​ label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . ​ type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. ​ data unknown The payload of the message, which can be any data structure relevant to the message type. ​ RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). ​ Connection Management ​ client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient ​ library string required ​ library_version string ​ platform string ​ platform_version string ​ platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. ​ bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. ​ disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined ​ error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. ​ Transcription ​ user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None ​ user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None ​ bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None ​ bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None ​ user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. ​ bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. ​ Client-Server Messaging ​ server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. ​ client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. ​ server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. ​ error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string ​ Advanced LLM Interactions ​ append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. ​ llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. ​ llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. ​ bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } ​ Service-Specific Insights ​ bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None ​ bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None ​ user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. ​ bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. ​ bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None ​ bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None ​ bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. ​ Metrics and Monitoring ​ metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/client_rtvi-standard_9518831c.txt b/client_rtvi-standard_9518831c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2dd8a6ff31e6371c1f459327397128cf4c8292e7
--- /dev/null
+++ b/client_rtvi-standard_9518831c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/rtvi-standard#bot-tts-text-%F0%9F%A4%96
+Title: The RTVI Standard - Pipecat
+==================================================
+
+The RTVI Standard - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. ​ Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. ​ Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. ​ RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } ​ id string A unique identifier for the message, used to correlate requests and responses. ​ label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . ​ type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. ​ data unknown The payload of the message, which can be any data structure relevant to the message type. ​ RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a 🤖 or 🏄 emoji to denote whether the message is sent from the bot (🤖) or client (🏄). ​ Connection Management ​ client-ready 🏄 Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient ​ library string required ​ library_version string ​ platform string ​ platform_version string ​ platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. ​ bot-ready 🤖 Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. It’s structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. ​ disconnect-bot 🏄 Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined ​ error 🤖 Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. ​ Transcription ​ user-started-speaking 🤖 Emitted when the user begins speaking type : 'user-started-speaking' data : None ​ user-stopped-speaking 🤖 Emitted when the user stops speaking type : 'user-stopped-speaking' data : None ​ bot-started-speaking 🤖 Emitted when the bot begins speaking type : 'bot-started-speaking' data : None ​ bot-stopped-speaking 🤖 Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None ​ user-transcription 🤖 Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. ​ bot-transcription 🤖 Transcription of the bot’s speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the bot’s response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your bot’s speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. ​ Client-Server Messaging ​ server-message 🤖 An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. ​ client-message 🏄 An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. ​ server-response 🤖 An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. ​ error-response 🤖 Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string ​ Advanced LLM Interactions ​ append-to-context 🏄 A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. ​ llm-function-call 🤖 A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. ​ llm-function-call-result 🏄 The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. ​ bot-llm-search-response 🤖 Search results from the LLM’s knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } ​ Service-Specific Insights ​ bot-llm-started 🤖 Indicates LLM processing has begun type : bot-llm-started data : None ​ bot-llm-stopped 🤖 Indicates LLM processing has completed type : bot-llm-stopped data : None ​ user-llm-text 🤖 Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The user’s input text to be processed by the LLM. ​ bot-llm-text 🤖 Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. ​ bot-tts-started 🤖 Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None ​ bot-tts-stopped 🤖 Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None ​ bot-tts-text 🤖 The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. ​ Metrics and Monitoring ​ metrics 🤖 Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready 🏄 bot-ready 🤖 disconnect-bot 🏄 error 🤖 Transcription user-started-speaking 🤖 user-stopped-speaking 🤖 bot-started-speaking 🤖 bot-stopped-speaking 🤖 user-transcription 🤖 bot-transcription 🤖 Client-Server Messaging server-message 🤖 client-message 🏄 server-response 🤖 error-response 🤖 Advanced LLM Interactions append-to-context 🏄 llm-function-call 🤖 llm-function-call-result 🏄 bot-llm-search-response 🤖 Service-Specific Insights bot-llm-started 🤖 bot-llm-stopped 🤖 user-llm-text 🤖 bot-llm-text 🤖 bot-tts-started 🤖 bot-tts-stopped 🤖 bot-tts-text 🤖 Metrics and Monitoring metrics 🤖 Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/daily_rest-helpers_1385bddb.txt b/daily_rest-helpers_1385bddb.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e48f342834c746d1edc22f8e76820d097fb1f451
--- /dev/null
+++ b/daily_rest-helpers_1385bddb.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-recordings-bucket
+Title: Daily REST Helper - Pipecat
+==================================================
+
+Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details ​ Classes ​ DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. ​ display_name string default: "sw-sip-dialin" Display name for the SIP endpoint ​ video boolean default: false Whether video is enabled for SIP ​ sip_mode string default: "dial-in" SIP connection mode ​ num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) ​ RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. ​ bucket_name string required Name of the S3 bucket for storing recordings ​ bucket_region string required AWS region where the S3 bucket is located ​ assume_role_arn string required ARN of the IAM role to assume for S3 access ​ allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) ​ DailyRoomProperties Properties that configure a Daily room’s behavior and features. ​ exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) ​ enable_chat boolean default: false Whether chat is enabled in the room ​ enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled ​ enable_emoji_reactions boolean default: false Whether emoji reactions are enabled ​ eject_at_room_exp boolean default: false Whether to eject participants when room expires ​ enable_dialout boolean Whether dial-out is enabled ​ enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) ​ geo string Geographic region for room ​ max_participants number Maximum number of participants allowed in the room ​ recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings ​ sip DailyRoomSipParams SIP configuration parameters ​ sip_uri dict SIP URI configuration (returned by Daily) ​ start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) ​ DailyRoomParams Parameters for creating a new Daily room. ​ name string Room name (if not provided, one will be generated) ​ privacy string default: "public" Room privacy setting (“private” or “public”) ​ properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) ​ DailyRoomObject Response object representing a Daily room. ​ id string Unique room identifier ​ name string Room name ​ api_created boolean Whether the room was created via API ​ privacy string Room privacy setting ​ url string Complete room URL ​ created_at string Room creation timestamp in ISO 8601 format ​ config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) ​ DailyMeetingTokenProperties Properties for configuring a Daily meeting token. ​ room_name string The room this token is valid for. If not set, token is valid for all rooms. ​ eject_at_token_exp boolean Whether to eject user when token expires ​ eject_after_elapsed integer Eject user after this many seconds ​ nbf integer “Not before” timestamp - users cannot join before this time ​ exp integer Expiration timestamp - users cannot join after this time ​ is_owner boolean Whether token grants owner privileges ​ user_name string User’s display name in the meeting ​ user_id string Unique identifier for the user (36 char limit) ​ enable_screenshare boolean Whether user can share their screen ​ start_video_off boolean Whether to join with video off ​ start_audio_off boolean Whether to join with audio off ​ enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) ​ enable_prejoin_ui boolean Whether to show prejoin UI ​ start_cloud_recording boolean Whether to start cloud recording when user joins ​ permissions dict Initial default permissions for a non-meeting-owner participant ​ DailyMeetingTokenParams Parameters for creating a Daily meeting token. ​ properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) ​ Initialize DailyRESTHelper Create a new instance of the Daily REST helper. ​ daily_api_key string required Your Daily API key ​ daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL ​ aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) ​ Create Room Creates a new Daily room with specified parameters. ​ params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) ​ Get Room From URL Retrieves room information using a Daily room URL. ​ room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) ​ Get Token Generates a meeting token for a specific room. ​ room_url string required The complete Daily room URL ​ expiry_time float default: "3600" Token expiration time in seconds ​ eject_at_token_exp bool default: "False" Whether to eject user when token expires ​ owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) ​ params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) ​ Delete Room By URL Deletes a room using its URL. ​ room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) ​ Delete Room By Name Deletes a room using its name. ​ room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) ​ Get Name From URL Extracts the room name from a Daily room URL. ​ room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/daily_rest-helpers_8c0e12cc.txt b/daily_rest-helpers_8c0e12cc.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6a005749eddd7976d305e612cb33884b89f00aba
--- /dev/null
+++ b/daily_rest-helpers_8c0e12cc.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers
+Title: Daily REST Helper - Pipecat
+==================================================
+
+Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details ​ Classes ​ DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. ​ display_name string default: "sw-sip-dialin" Display name for the SIP endpoint ​ video boolean default: false Whether video is enabled for SIP ​ sip_mode string default: "dial-in" SIP connection mode ​ num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) ​ RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. ​ bucket_name string required Name of the S3 bucket for storing recordings ​ bucket_region string required AWS region where the S3 bucket is located ​ assume_role_arn string required ARN of the IAM role to assume for S3 access ​ allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) ​ DailyRoomProperties Properties that configure a Daily room’s behavior and features. ​ exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) ​ enable_chat boolean default: false Whether chat is enabled in the room ​ enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled ​ enable_emoji_reactions boolean default: false Whether emoji reactions are enabled ​ eject_at_room_exp boolean default: false Whether to eject participants when room expires ​ enable_dialout boolean Whether dial-out is enabled ​ enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) ​ geo string Geographic region for room ​ max_participants number Maximum number of participants allowed in the room ​ recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings ​ sip DailyRoomSipParams SIP configuration parameters ​ sip_uri dict SIP URI configuration (returned by Daily) ​ start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) ​ DailyRoomParams Parameters for creating a new Daily room. ​ name string Room name (if not provided, one will be generated) ​ privacy string default: "public" Room privacy setting (“private” or “public”) ​ properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) ​ DailyRoomObject Response object representing a Daily room. ​ id string Unique room identifier ​ name string Room name ​ api_created boolean Whether the room was created via API ​ privacy string Room privacy setting ​ url string Complete room URL ​ created_at string Room creation timestamp in ISO 8601 format ​ config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) ​ DailyMeetingTokenProperties Properties for configuring a Daily meeting token. ​ room_name string The room this token is valid for. If not set, token is valid for all rooms. ​ eject_at_token_exp boolean Whether to eject user when token expires ​ eject_after_elapsed integer Eject user after this many seconds ​ nbf integer “Not before” timestamp - users cannot join before this time ​ exp integer Expiration timestamp - users cannot join after this time ​ is_owner boolean Whether token grants owner privileges ​ user_name string User’s display name in the meeting ​ user_id string Unique identifier for the user (36 char limit) ​ enable_screenshare boolean Whether user can share their screen ​ start_video_off boolean Whether to join with video off ​ start_audio_off boolean Whether to join with audio off ​ enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) ​ enable_prejoin_ui boolean Whether to show prejoin UI ​ start_cloud_recording boolean Whether to start cloud recording when user joins ​ permissions dict Initial default permissions for a non-meeting-owner participant ​ DailyMeetingTokenParams Parameters for creating a Daily meeting token. ​ properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) ​ Initialize DailyRESTHelper Create a new instance of the Daily REST helper. ​ daily_api_key string required Your Daily API key ​ daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL ​ aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) ​ Create Room Creates a new Daily room with specified parameters. ​ params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) ​ Get Room From URL Retrieves room information using a Daily room URL. ​ room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) ​ Get Token Generates a meeting token for a specific room. ​ room_url string required The complete Daily room URL ​ expiry_time float default: "3600" Token expiration time in seconds ​ eject_at_token_exp bool default: "False" Whether to eject user when token expires ​ owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) ​ params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) ​ Delete Room By URL Deletes a room using its URL. ​ room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) ​ Delete Room By Name Deletes a room using its name. ​ room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) ​ Get Name From URL Extracts the room name from a Daily room URL. ​ room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/daily_rest-helpers_a1173d8d.txt b/daily_rest-helpers_a1173d8d.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b2c9cf23221b99cd09ec1ea67df46aee3f40e2f1
--- /dev/null
+++ b/daily_rest-helpers_a1173d8d.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-bucket-region
+Title: Daily REST Helper - Pipecat
+==================================================
+
+Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details ​ Classes ​ DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. ​ display_name string default: "sw-sip-dialin" Display name for the SIP endpoint ​ video boolean default: false Whether video is enabled for SIP ​ sip_mode string default: "dial-in" SIP connection mode ​ num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) ​ RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. ​ bucket_name string required Name of the S3 bucket for storing recordings ​ bucket_region string required AWS region where the S3 bucket is located ​ assume_role_arn string required ARN of the IAM role to assume for S3 access ​ allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) ​ DailyRoomProperties Properties that configure a Daily room’s behavior and features. ​ exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) ​ enable_chat boolean default: false Whether chat is enabled in the room ​ enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled ​ enable_emoji_reactions boolean default: false Whether emoji reactions are enabled ​ eject_at_room_exp boolean default: false Whether to eject participants when room expires ​ enable_dialout boolean Whether dial-out is enabled ​ enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) ​ geo string Geographic region for room ​ max_participants number Maximum number of participants allowed in the room ​ recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings ​ sip DailyRoomSipParams SIP configuration parameters ​ sip_uri dict SIP URI configuration (returned by Daily) ​ start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) ​ DailyRoomParams Parameters for creating a new Daily room. ​ name string Room name (if not provided, one will be generated) ​ privacy string default: "public" Room privacy setting (“private” or “public”) ​ properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) ​ DailyRoomObject Response object representing a Daily room. ​ id string Unique room identifier ​ name string Room name ​ api_created boolean Whether the room was created via API ​ privacy string Room privacy setting ​ url string Complete room URL ​ created_at string Room creation timestamp in ISO 8601 format ​ config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) ​ DailyMeetingTokenProperties Properties for configuring a Daily meeting token. ​ room_name string The room this token is valid for. If not set, token is valid for all rooms. ​ eject_at_token_exp boolean Whether to eject user when token expires ​ eject_after_elapsed integer Eject user after this many seconds ​ nbf integer “Not before” timestamp - users cannot join before this time ​ exp integer Expiration timestamp - users cannot join after this time ​ is_owner boolean Whether token grants owner privileges ​ user_name string User’s display name in the meeting ​ user_id string Unique identifier for the user (36 char limit) ​ enable_screenshare boolean Whether user can share their screen ​ start_video_off boolean Whether to join with video off ​ start_audio_off boolean Whether to join with audio off ​ enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) ​ enable_prejoin_ui boolean Whether to show prejoin UI ​ start_cloud_recording boolean Whether to start cloud recording when user joins ​ permissions dict Initial default permissions for a non-meeting-owner participant ​ DailyMeetingTokenParams Parameters for creating a Daily meeting token. ​ properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) ​ Initialize DailyRESTHelper Create a new instance of the Daily REST helper. ​ daily_api_key string required Your Daily API key ​ daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL ​ aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) ​ Create Room Creates a new Daily room with specified parameters. ​ params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) ​ Get Room From URL Retrieves room information using a Daily room URL. ​ room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) ​ Get Token Generates a meeting token for a specific room. ​ room_url string required The complete Daily room URL ​ expiry_time float default: "3600" Token expiration time in seconds ​ eject_at_token_exp bool default: "False" Whether to eject user when token expires ​ owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) ​ params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) ​ Delete Room By URL Deletes a room using its URL. ​ room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) ​ Delete Room By Name Deletes a room using its name. ​ room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) ​ Get Name From URL Extracts the room name from a Daily room URL. ​ room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/daily_rest-helpers_c7f0b992.txt b/daily_rest-helpers_c7f0b992.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d783b18efbbbc37c617b8b9e0fa75fa73e6b71fd
--- /dev/null
+++ b/daily_rest-helpers_c7f0b992.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-eject-at-token-exp-1
+Title: Daily REST Helper - Pipecat
+==================================================
+
+Daily REST Helper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details ​ Classes ​ DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. ​ display_name string default: "sw-sip-dialin" Display name for the SIP endpoint ​ video boolean default: false Whether video is enabled for SIP ​ sip_mode string default: "dial-in" SIP connection mode ​ num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) ​ RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. ​ bucket_name string required Name of the S3 bucket for storing recordings ​ bucket_region string required AWS region where the S3 bucket is located ​ assume_role_arn string required ARN of the IAM role to assume for S3 access ​ allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) ​ DailyRoomProperties Properties that configure a Daily room’s behavior and features. ​ exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) ​ enable_chat boolean default: false Whether chat is enabled in the room ​ enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled ​ enable_emoji_reactions boolean default: false Whether emoji reactions are enabled ​ eject_at_room_exp boolean default: false Whether to eject participants when room expires ​ enable_dialout boolean Whether dial-out is enabled ​ enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) ​ geo string Geographic region for room ​ max_participants number Maximum number of participants allowed in the room ​ recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings ​ sip DailyRoomSipParams SIP configuration parameters ​ sip_uri dict SIP URI configuration (returned by Daily) ​ start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) ​ DailyRoomParams Parameters for creating a new Daily room. ​ name string Room name (if not provided, one will be generated) ​ privacy string default: "public" Room privacy setting (“private” or “public”) ​ properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) ​ DailyRoomObject Response object representing a Daily room. ​ id string Unique room identifier ​ name string Room name ​ api_created boolean Whether the room was created via API ​ privacy string Room privacy setting ​ url string Complete room URL ​ created_at string Room creation timestamp in ISO 8601 format ​ config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) ​ DailyMeetingTokenProperties Properties for configuring a Daily meeting token. ​ room_name string The room this token is valid for. If not set, token is valid for all rooms. ​ eject_at_token_exp boolean Whether to eject user when token expires ​ eject_after_elapsed integer Eject user after this many seconds ​ nbf integer “Not before” timestamp - users cannot join before this time ​ exp integer Expiration timestamp - users cannot join after this time ​ is_owner boolean Whether token grants owner privileges ​ user_name string User’s display name in the meeting ​ user_id string Unique identifier for the user (36 char limit) ​ enable_screenshare boolean Whether user can share their screen ​ start_video_off boolean Whether to join with video off ​ start_audio_off boolean Whether to join with audio off ​ enable_recording string Recording settings (“cloud”, “local”, or “raw-tracks”) ​ enable_prejoin_ui boolean Whether to show prejoin UI ​ start_cloud_recording boolean Whether to start cloud recording when user joins ​ permissions dict Initial default permissions for a non-meeting-owner participant ​ DailyMeetingTokenParams Parameters for creating a Daily meeting token. ​ properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) ​ Initialize DailyRESTHelper Create a new instance of the Daily REST helper. ​ daily_api_key string required Your Daily API key ​ daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL ​ aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) ​ Create Room Creates a new Daily room with specified parameters. ​ params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) ​ Get Room From URL Retrieves room information using a Daily room URL. ​ room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) ​ Get Token Generates a meeting token for a specific room. ​ room_url string required The complete Daily room URL ​ expiry_time float default: "3600" Token expiration time in seconds ​ eject_at_token_exp bool default: "False" Whether to eject user when token expires ​ owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) ​ params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) ​ Delete Room By URL Deletes a room using its URL. ​ room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) ​ Delete Room By Name Deletes a room using its name. ​ room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) ​ Get Name From URL Extracts the room name from a Daily room URL. ​ room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/deployment_fly_514ec763.txt b/deployment_fly_514ec763.txt
new file mode 100644
index 0000000000000000000000000000000000000000..148ab7b42a0a17b30269b0cf990b8a1989f3873e
--- /dev/null
+++ b/deployment_fly_514ec763.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/deployment/fly#important-considerations
+Title: Example: Fly.io - Pipecat
+==================================================
+
+Example: Fly.io - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Example: Fly.io Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Project setup Let’s explore how we can use fly.io to make our app scalable for production by spawning our Pipecat bots on virtual machines with their own resources. We mentioned before that you would ideally containerize the bot_runner.py web service and the bot.py separately. To keep this example simple, we’ll use the same container image for both services. ​ Install the Fly CLI You can find instructions for creating and setting up your fly account here . ​ Creating the Pipecat project We have created a template project here which you can clone. Since we’re targeting production use-cases, this example uses Daily (WebRTC) as a transport, but you can configure your bot however you like. ​ Adding a fly.toml Add a fly.toml to the root of your project directory. Here is a basic example: fly.toml Copy Ask AI app = 'some-unique-app-name' primary_region = 'sjc' [ build ] [ env ] FLY_APP_NAME = 'some-unique-app-name' [ http_service ] internal_port = 7860 force_https = true auto_stop_machines = true auto_start_machines = true min_machines_running = 0 processes = [ 'app' ] [[ vm ]] memory = 512 cpu_kind = 'shared' cpus = 1 For apps with lots of users, consider what resources your HTTP service will require to meet load. We’ll define our bot.py resources later, so you can set and scale these as you like ( fly scale ... ) ​ Environment setup Our bot requires some API keys and configuration, so create a .env in your project root: .env Copy Ask AI DAILY_API_KEY = OPENAI_API_KEY = ELEVENLABS_API_KEY = ELEVENLABS_VOICE_ID = FLY_API_KEY = FLY_APP_NAME = Of course, the exact keys you need will depend on which services you are using within your bot.py . Important: your FLY_APP_NAME should match the name of your fly instance, such as that declared in your fly.toml. The .env will allow us to test in local development, but is not included in the deployment. You’ll need to set them as Fly app secrets, which you can do via the Fly dashboard or cli. fly secrets set ... ​ Containerize our app Our Fly deployment will need a container image; let’s create a simple Dockerfile in the root of the project: Dockerfile .dockerignore Copy Ask AI FROM python:3.11-slim-bookworm # Open port 7860 for http service ENV FAST_API_PORT= 7860 EXPOSE 7860 # Install Python dependencies COPY \* .py . COPY ./requirements.txt requirements.txt RUN pip3 install --no-cache-dir --upgrade -r requirements.txt # Install models RUN python3 install_deps.py # Start the FastAPI server CMD python3 bot_runner.py --port ${ FAST_API_PORT } You can use any base image as long as Python is available Our container does the following: Opens port 7860 to serve our bot_runner.py FastAPI service. Downloads the necessary python dependencies. Download / cache the model dependencies the bot.py requires. Runs the bot_runner.py and listens for web requests. ​ What models are we downloading? To support voice activity detection, we’re using Silero VAD. Whilst the filesize is not huge, having each new machine download the Silero model at runtime will impact bootup time. Instead, we include the model as part of the Docker image so it’s cached and available. You could, of course, also attach a network volume to each instance if you plan to include larger files as part of your deployment and don’t want to bloat the size of your image. ​ Launching new machines in bot_runner.py When a user starts a session with our Pipecat bot, we want to launch a new machine on fly.io with it’s own system resources. Let’s grab the bot_runner.py from the example repo here . This runner differs from others in the Pipecat repo; we’ve added a new method that sends a REST request to Fly to provision a new machine for the session. This method is invoked as part of the /start_bot endpoint: bot_runner.py Copy Ask AI FLY_API_HOST = os.getenv( "FLY_API_HOST" , "https://api.machines.dev/v1" ) FLY_APP_NAME = os.getenv( "FLY_APP_NAME" , "your-fly-app-name" ) FLY_API_KEY = os.getenv( "FLY_API_KEY" , "" ) FLY_HEADERS = { 'Authorization' : f "Bearer { FLY_API_KEY } " , 'Content-Type' : 'application/json' } def spawn_fly_machine ( room_url : str , token : str ): # Use the same image as the bot runner res = requests.get( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines" , headers = FLY_HEADERS ) if res.status_code != 200 : raise Exception ( f "Unable to get machine info from Fly: { res.text } " ) image = res.json()[ 0 ][ 'config' ][ 'image' ] # Machine configuration cmd = f "python3 bot.py -u { room_url } -t { token } " cmd = cmd.split() worker_props = { "config" : { "image" : image, "auto_destroy" : True , "init" : { "cmd" : cmd }, "restart" : { "policy" : "no" }, "guest" : { "cpu_kind" : "shared" , "cpus" : 1 , "memory_mb" : 1024 # Note: 512 is just enough to run VAD, but 1gb is better } }, } # Spawn a new machine instance res = requests.post( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines" , headers = FLY_HEADERS , json = worker_props) if res.status_code != 200 : raise Exception ( f "Problem starting a bot worker: { res.text } " ) # Wait for the machine to enter the started state vm_id = res.json()[ 'id' ] res = requests.get( f " { FLY_API_HOST } /apps/ { FLY_APP_NAME } /machines/ { vm_id } /wait?state=started" , headers = FLY_HEADERS ) if res.status_code != 200 : raise Exception ( f "Bot was unable to enter started state: { res.text } " ) We want to make sure the machine started ok before returning any data to the user. Fly launches machines pretty fast, but will timeout if things take longer than they should. Depending on your transport method, you may want to optimistically return a response to the user, so they can join the room and poll for the status of their bot. ​ Launch the Fly project Getting your bot on Fly is as simple as: fly launch or fly launch --org orgname if you’re part of a team. This will step you through some configuration, and build and deploy your Docker image. Be sure to configure your app secrets with the necessary environment variables once the deployment has complete. Assuming all goes well, you can update with any changes with fly deploy . ​ Test it out Start a new bot instance by sending a POST request to https://your-fly-url.fly.dev/start_bot . All being well, this will return a room URL and token. A nice feature of Fly is the ability to monitor your machines (with live logs) via their dashboard: https://fly.io/apps/YOUR-APP_NAME/machines This is really helpful for monitoring the status of your spawned machine, and debugging if things do not work as expected. This example is configured to expire after 5 minutes. The bot process is also configured to exit after the user leaves the room. This is a good way to ensure we don’t have any hanging VMs, although you’ll likely need to configure this behaviour this to meet your own needs. You’ll also notice that we set restart policy to no . This prevents the machine attempting to restart after the session has concluded and the process exits. ​ Important considerations This example does little in the way of load balancing or app security. Indeed, a user can spawn a new machine on your account simply by sending a POST request to the bot_runner.py . Be sure to configure a maximum number of instances, or authenticate requests to avoid costs getting out of control. We also deployed our bot.py on a machine with the same image as our bot_runner.py . To optimize container file sizes and increase security, consider individual images that only deploy resources they require. Example: Pipecat Cloud Example: Cerebrium On this page Project setup Install the Fly CLI Creating the Pipecat project Adding a fly.toml Environment setup Containerize our app What models are we downloading? Launching new machines in bot_runner.py Launch the Fly project Test it out Important considerations Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/deployment_modal_dac14358.txt b/deployment_modal_dac14358.txt
new file mode 100644
index 0000000000000000000000000000000000000000..fecc7851aa85c93251be0003780ff6c92afde3d8
--- /dev/null
+++ b/deployment_modal_dac14358.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/deployment/modal#install-the-modal-cli
+Title: Example: Modal - Pipecat
+==================================================
+
+Example: Modal - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Example: Modal Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Modal is well-suited for Pipecat deployments because it handles container orchestration, scaling, and cold starts efficiently. This makes it a good choice for production Pipecat bots that need reliable performance. This guide walks through the Modal example included in the Pipecat repository, which follows the same deployment pattern . Modal example View the complete Modal deployment example in our GitHub repository ​ Install the Modal CLI Set up Modal Follow Modal’s official instructions for creating an account and setting up the CLI ​ Deploy a self-serve LLM Deploy Modal’s OpenAI-compatible LLM service: Copy Ask AI git clone https://github.com/modal-labs/modal-examples cd modal-examples modal deploy 06_gpu_and_ml/llm-serving/vllm_inference.py Refer to Modal’s guide and example for Deploying an OpenAI-compatible LLM service with vLLM for more details. Take note of the endpoint URL from the previous step, which will look like: Copy Ask AI https://{your-workspace}--example-vllm-openai-compatible-serve.modal.run You’ll need this for the bot_vllm.py file in the next section. The default Modal LLM example uses Llama-3.1 and will shut down after 15 minutes of inactivity. Cold starts take 5-10 minutes. To prepare the service, we recommend visiting the /docs endpoint ( https://<Modal workspace>--example-vllm-openai-compatible-serve.modal.run/docs ) for your deployed LLM and wait for it to fully load before connecting your client. ​ Deploy FastAPI App and Pipecat pipeline to Modal Setup environment variables: Copy Ask AI cd server cp env.example .env # Modify .env to provide your service API Keys Alternatively, you can configure your Modal app to use secrets . Update the modal_url in server/src/bot_vllm.py to point to the URL you received from the self-serve LLM deployment in the previous step. From within the server directory, test the app locally: Copy Ask AI modal serve app.py Deploy to production: Copy Ask AI modal deploy app.py Note the endpoint URL produced from this deployment. It will look like: Copy Ask AI https:// {your-workspace} --pipecat-modal-fastapi-app.modal.run You’ll need this URL for the client’s app.js configuration mentioned in its README. ​ Launch your bots on Modal ​ Option 1: Direct Link Simply click on the URL displayed after running the server or deploy step to launch an agent and be redirected to a Daily room to talk with the launched bot. This will use the OpenAI pipeline. ​ Option 2: Connect via an RTVI Client Follow the instructions provided in the client folder’s README for building and running a custom client that connects to your Modal endpoint. The provided client includes a dropdown for choosing which bot pipeline to run. ​ Navigating your LLM, server, and Pipecat logs On your Modal dashboard , you should have two Apps listed under Live Apps: example-vllm-openai-compatible : This App contains the containers and logs used to run your self-hosted LLM. There will be just one App Function listed: serve . Click on this function to view logs for your LLM. pipecat-modal : This App contains the containers and logs used to run your connect endpoints and Pipecat pipelines. It will list two App Functions: fastapi_app : This function is running the endpoints that your client will interact with and initiate starting a new pipeline ( / , /connect , /status ). Click on this function to see logs for each endpoint hit. bot_runner : This function handles launching and running a bot pipeline. Click on this function to get a list of all pipeline runs and access each run’s logs. ​ Modal & Pipecat Tips In most other Pipecat examples, we use Popen to launch the pipeline process from the /connect endpoint. In this example, we use a Modal function instead. This allows us to run the pipelines using a separately defined Modal image as well as run each pipeline in an isolated container. For the FastAPI and most common Pipecat Pipeline containers, a default debian_slim CPU-only should be all that’s required to run. GPU containers are needed for self-hosted services. To minimize cold starts of the pipeline and reduce latency for users, set min_containers=1 on the Modal Function that launches the pipeline to ensure at least one warm instance of your function is always available. ​ Next steps Explore Modal's LLM Examples For next steps on running a self-hosted LLM and reducing latency, check out all of Modal’s LLM examples Example: Cerebrium On this page Install the Modal CLI Deploy a self-serve LLM Deploy FastAPI App and Pipecat pipeline to Modal Launch your bots on Modal Option 1: Direct Link Option 2: Connect via an RTVI Client Navigating your LLM, server, and Pipecat logs Modal & Pipecat Tips Next steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/deployment_overview_ebce20f6.txt b/deployment_overview_ebce20f6.txt
new file mode 100644
index 0000000000000000000000000000000000000000..96b846f83c8bc15e009c472cf94597461c83d535
--- /dev/null
+++ b/deployment_overview_ebce20f6.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/deployment/overview#production-ready-bots
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Introduction to Pipecat deployment You’ve created your Pipecat bot, had a good chat with it locally, and are eager to share it with the world. Let’s explore how to approach deployment. We’re continually adding further deployment example projects to the Pipecat repo, which you can find here . ​ Deployment options You have several options for deploying your Pipecat bot: Pipecat Cloud - A purpose-built managed service designed specifically for Pipecat deployments Self-managed cloud deployment - Deploy to providers like Fly.io, AWS, Google Cloud Run, etc. Custom infrastructure - Run on your own servers or specialized AI infrastructure The best choice depends on your specific requirements, scale, and expertise. ​ Things you’ll need Transport service - Pipecat has existing services for various different media transport modes, such as WebRTC or WebSockets. If you’re not using a third-party service for handling media transport, you’ll want to make sure that infrastructure is hosted and ready to receive connections. Deployment target - You can deploy and run Pipecat bots anywhere that can run Python code - Google Cloud Run, AWS, Fly.io etc. We recommend providers that offer APIs, so you can programmatically spawn new bot agents on-demand. Docker - If you’re targeting cloud architecture / VMs, they will most often expect a containerized app. It’s worth having Docker installed and setup to run builds. We’ll step through creating a Dockerfile in this documentation. ​ Production-ready bots In local development, things often work great as you’re testing on controlled, stable network conditions. In real-world use-cases, however, your users will likely interact with your bot across a variety of different devices and network conditions. WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll likely want client-server audio that uses a protocol designed for real-time media transport. For an explanation of the difference between WebSockets and WebRTC, see this post . If you’re targeting scalable, client-server interactions, we recommend you use WebRTC for the best results. ​ Supporting models Most chatbots require very little in the way of system resources, but if you are making use of custom models or require GPU-powered infrastructure, it’s important to consider how to pre-cache local resources so that they are not downloaded at runtime. Your bot processes / VMs should aim to launch and connect as quickly as possible, so the user is not left waiting. Designing and operating a pool of workers is out of scope for our documentation, but we’ll highlight best practices in all of our examples. As an example of a supporting model, most Pipecat examples make use of Silero VAD which we recommend including as part of your Docker image (so it’s cached and readily available when your bot runs.) Since the Silero model is quite small, this doesn’t inflate the size of the container too much. You may, however, want to consider making large models availabile via a network volume and ensuring your bot knows where to find it. For Silero specifically, you can read more about how to do download it directly here . Copy Ask AI # Run at buildtime torch.hub.load( repo_or_dir = 'snakers4/silero-vad' , model = 'silero_vad' , force_reload = True ) ​ Getting started Basic deployment pattern Introduction to a model for deploying Pipecat bots ​ Provider guides Once you’ve familiarized yourself with the Pipecat deployment pattern , here are some guides that walk you through the process for various deployment options. Remember, your Pipecat bots are simply Python processes, so you can host them on whichever infrastructure or service best suits your project. Pipecat Cloud Managed service purpose-built for Pipecat deployments Fly.io For service-driven / CPU bots Cerebrium For GPU-accelerated models & specialized AI infrastructure Dialout: WebRTC (Daily) Deployment pattern On this page Introduction to Pipecat deployment Deployment options Things you’ll need Production-ready bots Supporting models Getting started Provider guides Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/deployment_pattern_150d6986.txt b/deployment_pattern_150d6986.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a8efa3ca99b433b5947f2a1822cf8943909f1b2a
--- /dev/null
+++ b/deployment_pattern_150d6986.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/deployment/pattern#dockerfile
+Title: Deployment pattern - Pipecat
+==================================================
+
+Deployment pattern - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Deploying your bot Deployment pattern Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Project structure A Pipecat project will often consist of the following: 1. Bot file E.g. bot.py . Your Pipecat bot / agent, containing all the pipelines that you want to run in order to communicate with an end-user. A bot file may take some command line arguments, such as a transport URL and configuration. 2. Bot runner E.g. bot_runner.py . Typically a basic HTTP service that listens for incoming user requests and spawns the relevant bot file in response. You can call these files whatever you like! We use bot.py and bot_runner.py for simplicity. ​ Typical user / bot flow There are many ways to approach connecting users to bots. Pipecat is unopinionated about how exactly you should do this, but it’s helpful to put an idea forward. At a very basic level, it may look something like this: 1 User requests to join session via client / app Client initiates a HTTP request to a hosted bot runner service. 2 Bot runner handles the request Authenticates, configures and instantiates everything necessary for the session to commence (e.g. a new WebSocket channel, or WebRTC room, etc.) 3 Bot runner spawns bot / agent A new bot process / VM is created for the user to connect with (passing across any necessary configuration.) Your project may load just one bot file, contextually swap between multiple, or launch many at once. 4 Bot instantiates and joins session via specified transport credentials Bot initializes, connects to the session (e.g. locally or via WebSockets, WebRTC etc) and runs your bot code. 5 Bot runner returns status to client Once the bot is ready, the runner resolves the HTTP request with details for the client to connect. ​ Bot runner The majority of use-cases require a way to trigger and manage a bot session over the internet. We call these bot runners; a HTTP service that provides a gateway for spawning bots on-demand. The anatomy of a bot runner service is entirery arbitrary, but at very least will have a method that spawns a new bot process, for example: Copy Ask AI import uvicorn from fastapi import FastAPI, Request, HTTPException from fastapi.responses import JSONResponse app = FastAPI() @app.post ( "/start_bot" ) async def start_bot ( request : Request) -> JSONResponse: # ... handle / authenticate the request # ... setup the transport session # Spawn a new bot process try : #... create a new bot instance except Exception as e: raise HTTPException( status_code = 500 , detail = f "Failed to start bot: { e } " ) # Return a URL for the user to join return JSONResponse({ ... }) if __name__ == "__main__" : uvicorn.run( "bot_runner:app" , host = "0.0.0.0" , port = 7860 ) This pseudo code defines a /start_bot/ endpoint which listens for incoming user POST requests or webhooks, then configures the session (such as creating rooms on your transport provider) and instantiates a new bot process. A client will typically require some information regarding the newly spawned bot, such as a web address, so we also return some JSON with the necessary details. ​ Data transport Your transport layer is responsible for sending and receiving audio and video data over the internet. You will have implemented a transport layer as part of your bot.py pipeline. This may be a service that you want to host and include in your deployment, or it may be a third-party service waiting for peers to connect (such as Daily , or a websocket.) For this example, we will make use of Daily’s WebRTC transport. This will mean that our bot_runner.py will need to do some configuration when it spawns a new bot: Create and configure a new Daily room for the session to take place in. Issue both the bot and the user an authentication token to join the session. Whatever you use for your transport layer, you’ll likely need to setup some environmental variables and run some custom code before spawning the agent. ​ Best practice for bot files A good pattern to work to is the assumption that your bot.py is an encapsulated entity and does not have any knowledge of the bot_runner.py . You should provide the bot everything it needs to operate during instantiation. Sticking to this approach helps keep things simple and makes it easier to step through debugging (if the bot launched and something goes wrong, you know to look for errors in your bot file.) ​ Example Let’s assume we have a fully service-driven bot.py that connects to a WebRTC session, passes audio transcription to GPT4 and returns audio text-to-speech with ElevenLabs. We’ll also use Silero voice activity detection, to better know when the user has stopped talking. bot.py Copy Ask AI import asyncio import aiohttp import os import sys import argparse from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.processors.aggregators.llm_response import LLMAssistantResponseAggregator, LLMUserResponseAggregator from pipecat.frames.frames import LLMMessagesFrame, EndFrame from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.transports.services.daily import DailyParams, DailyTransport from pipecat.vad.silero import SileroVADAnalyzer from loguru import logger from dotenv import load_dotenv load_dotenv( override = True ) logger.remove( 0 ) logger.add(sys.stderr, level = "DEBUG" ) daily_api_key = os.getenv( "DAILY_API_KEY" , "" ) daily_api_url = os.getenv( "DAILY_API_URL" , "https://api.daily.co/v1" ) async def main ( room_url : str , token : str ): async with aiohttp.ClientSession() as session: transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( api_url = daily_api_url, api_key = daily_api_key, audio_in_enabled = True , audio_out_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ) ) tts = ElevenLabsTTSService( aiohttp_session = session, api_key = os.getenv( "ELEVENLABS_API_KEY" , "" ), voice_id = os.getenv( "ELEVENLABS_VOICE_ID" , "" ), ) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" ), model = "gpt-4o" ) messages = [ { "role" : "system" , "content" : "You are Chatbot, a friendly, helpful robot. Your output will be converted to audio so don't include special characters other than '!' or '?' in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by saying hello." , }, ] tma_in = LLMUserResponseAggregator(messages) tma_out = LLMAssistantResponseAggregator(messages) pipeline = Pipeline([ transport.input(), tma_in, llm, tts, transport.output(), tma_out, ]) task = PipelineTask(pipeline, params = PipelineParams( allow_interruptions = True )) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(messages)]) @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): await task.queue_frame(EndFrame()) @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state ): if state == "left" : await task.queue_frame(EndFrame()) runner = PipelineRunner() await runner.run(task) if __name__ == "__main__" : parser = argparse.ArgumentParser( description = "Pipecat Bot" ) parser.add_argument( "-u" , type = str , help = "Room URL" ) parser.add_argument( "-t" , type = str , help = "Token" ) config = parser.parse_args() asyncio.run(main(config.u, config.t)) ​ HTTP API To launch this bot, let’s create a bot_runner.py that: Creates an API for users to send requests to. Launches a bot as a subprocess. bot_runner.py Copy Ask AI import os import argparse import subprocess from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper, DailyRoomObject, DailyRoomProperties, DailyRoomParams from fastapi import FastAPI, Request, HTTPException from fastapi.middleware.cors import CORSMiddleware from fastapi.responses import JSONResponse # Load API keys from env from dotenv import load_dotenv load_dotenv( override = True ) # ------------ Configuration ------------ # MAX_SESSION_TIME = 5 * 60 # 5 minutes # List of require env vars our bot requires REQUIRED_ENV_VARS = [ 'DAILY_API_KEY' , 'OPENAI_API_KEY' , 'ELEVENLABS_API_KEY' , 'ELEVENLABS_VOICE_ID' ] daily_rest_helper = DailyRESTHelper( os.getenv( "DAILY_API_KEY" , "" ), os.getenv( "DAILY_API_URL" , 'https://api.daily.co/v1' )) # ----------------- API ----------------- # app = FastAPI() app.add_middleware( CORSMiddleware, allow_origins = [ "*" ], allow_credentials = True , allow_methods = [ "*" ], allow_headers = [ "*" ] ) # ----------------- Main ----------------- # @app.post ( "/start_bot" ) async def start_bot ( request : Request) -> JSONResponse: try : # Grab any data included in the post request data = await request.json() except Exception as e: pass # Create a new Daily WebRTC room for the session to take place in try : params = DailyRoomParams( properties = DailyRoomProperties() ) room: DailyRoomObject = daily_rest_helper.create_room( params = params) except Exception as e: raise HTTPException( status_code = 500 , detail = f "Unable to provision room { e } " ) # Give the agent a token to join the session token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME ) # Return an error if we were unable to create a room or a token if not room or not token: raise HTTPException( status_code = 500 , detail = f "Failed to get token for room: { room_url } " ) try : # Start a new subprocess, passing the room and token to the bot file subprocess.Popen( [ f "python3 -m bot -u { room.url } -t { token } " ], shell = True , bufsize = 1 , cwd = os.path.dirname(os.path.abspath( __file__ ))) except Exception as e: raise HTTPException( status_code = 500 , detail = f "Failed to start subprocess: { e } " ) # Grab a token for the user to join with user_token = daily_rest_helper.get_token(room.url, MAX_SESSION_TIME ) # Return the room url and user token back to the user return JSONResponse({ "room_url" : room.url, "token" : user_token, }) if __name__ == "__main__" : # Check for required environment variables for env_var in REQUIRED_ENV_VARS : if env_var not in os.environ: raise Exception ( f "Missing environment variable: { env_var } ." ) parser = argparse.ArgumentParser( description = "Pipecat Bot Runner" ) parser.add_argument( "--host" , type = str , default = os.getenv( "HOST" , "0.0.0.0" ), help = "Host address" ) parser.add_argument( "--port" , type = int , default = os.getenv( "PORT" , 7860 ), help = "Port number" ) parser.add_argument( "--reload" , action = "store_true" , default = False , help = "Reload code on change" ) config = parser.parse_args() try : import uvicorn uvicorn.run( "bot_runner:app" , host = config.host, port = config.port, reload = config.reload ) except KeyboardInterrupt : print ( "Pipecat runner shutting down..." ) ​ Dockerfile Since our bot is just using Python, our Dockerfile can be quite simple: Dockerfile install_deps.py Copy Ask AI FROM python:3.11-bullseye # Open port 7860 for http service ENV FAST_API_PORT= 7860 EXPOSE 7860 # Install Python dependencies COPY \* .py . COPY ./requirements.txt requirements.txt RUN pip3 install --no-cache-dir --upgrade -r requirements.txt # Install models RUN python3 install_deps.py # Start the FastAPI server CMD python3 bot_runner.py --port ${ FAST_API_PORT } The bot runner and bot requirements.txt : requirements.txt Copy Ask AI pipecat-ai[daily,openai,silero] fastapi uvicorn python-dotenv And finally, let’s create a .env file with our service keys .env Copy Ask AI DAILY_API_KEY = ... OPENAI_API_KEY = ... ELEVENLABS_API_KEY = ... ELEVENLABS_VOICE_ID = ... ​ How it works Right now, this runner is spawning bot.py as a subprocess. When spawning the process, we pass through the transport room and token as system arguments to our bot, so it knows where to connect. Subprocesses serve as a great way to test out your bot in the cloud without too much hassle, but depending on the size of the host machine, it will likely not hold up well under load. Whilst some bots are just simple operators between the transport and third-party AI services (such as OpenAI), others have somewhat CPU-intensive operations, such as running and loading VAD models, so you may find you’re only able to scale this to support up to 5-10 concurrent bots. Scaling your setup would require virtualizing your bot with it’s own set of system resources, the process of which depends on your cloud provider. ​ Best practices In an ideal world, we’d recommend containerizing your bot and bot runner independently so you can deploy each without any unnecessary dependencies or models. Most cloud providers will offer a way to deploy various images programmatically, which we explore in the various provider examples in these docs. For the sake of simplicity defining this pattern, we’re just using one container for everything. ​ Build and run We should now have a project that contains the following files: bot.py bot_runner.py requirements.txt .env Dockerfile You can now docker build ... and deploy your container. Of course, you can still work with your bot in local development too: Copy Ask AI # Install and activate a virtual env python -m venv venv source venv/bin/activate # or OS equivalent pip install -r requirements.txt python bot_runner.py --host localhost --reload Overview Example: Pipecat Cloud On this page Project structure Typical user / bot flow Bot runner Data transport Best practice for bot files Example HTTP API Dockerfile How it works Best practices Build and run Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/deployment_wwwflyio_d516ecbd.txt b/deployment_wwwflyio_d516ecbd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cfc68bae88ba64b44162ce81b44743d8c8bb81b4
--- /dev/null
+++ b/deployment_wwwflyio_d516ecbd.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/deployment/www.fly.io#real-time-processing
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/features_gemini-multimodal-live_b148f683.txt b/features_gemini-multimodal-live_b148f683.txt
new file mode 100644
index 0000000000000000000000000000000000000000..22a8bff58ae20a2ad47499b1f608589c38abaa1c
--- /dev/null
+++ b/features_gemini-multimodal-live_b148f683.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/features/gemini-multimodal-live#environment-setup
+Title: Building with Gemini Multimodal Live - Pipecat
+==================================================
+
+Building with Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Building with Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal This guide will walk you through building a real-time AI chatbot using Gemini Multimodal Live and Pipecat. We’ll create a complete application with a Pipecat server and a Pipecat React client that enables natural conversations with an AI assistant. API Reference Gemini Multimodal Live API documentation Example Code Find the complete client and server code in Github Client SDK Pipecat React SDK documentation ​ What We’ll Build In this guide, you’ll create: A FastAPI server that manages bot instances A Gemini-powered conversational AI bot A React client with real-time audio/video A complete pipeline for speech-to-speech interaction ​ Key Concepts Before we dive into implementation, let’s cover some important concepts that will help you understand how Pipecat and Gemini work together. ​ Understanding Pipelines At the heart of Pipecat is the pipeline system. A pipeline is a sequence of processors that handle different aspects of the conversation flow. Think of it like an assembly line where each station (processor) performs a specific task. For our chatbot, the pipeline looks like this: Copy Ask AI pipeline = Pipeline([ transport.input(), # Receives audio/video from the user via WebRTC rtvi, # Handles client/server messaging and events context_aggregator.user(), # Manages user message history llm, # Processes speech through Gemini talking_animation, # Controls bot's avatar transport.output(), # Sends audio/video back to the user via WebRTC context_aggregator.assistant(), # Manages bot message history ]) ​ Processors Each processor in the pipeline handles a specific task: Transport transport.input() and transport.output() handle media streaming with Daily Context context_aggregator maintains conversation history for natural dialogue Speech Processing rtvi_user_transcription and rtvi_bot_transcription handle speech-to-text Animation talking_animation controls the bot’s visual state based on speaking activity The order of processors matters! Data flows through the pipeline in sequence, so each processor should receive the data it needs from previous processors. Learn more about the Core Concepts to Pipecat server. ​ Gemini Integration The GeminiMultimodalLiveLLMService is a speech-to-speech LLM service that interfaces with the Gemini Multimodal Live API. It provides: Real-time speech-to-speech conversation Context management Voice activity detection Tool use Pipecat manages two types of connections: A WebRTC connection between the Pipecat client and server for reliable audio/video streaming A WebSocket connection between the Pipecat server and Gemini for real-time AI processing This architecture ensures stable media streaming while maintaining responsive AI interactions. ​ Prerequisites Before we begin, you’ll need: Python 3.10 or higher Node.js 16 or higher A Daily API key A Google API key with Gemini Multimodal Live access Clone the Pipecat repo: Copy Ask AI git clone git@github.com:pipecat-ai/pipecat.git ​ Server Implementation Let’s start by setting up the server components. Our server will handle bot management, room creation, and client connections. ​ Environment Setup Navigate to the simple-chatbot’s server directory: Copy Ask AI cd examples/simple-chatbot/server Set up a python virtual environment: Copy Ask AI python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate Install requirements: Copy Ask AI pip install -r requirements.txt Copy env.example to .env and make a few changes: Copy Ask AI # Remove the hard-coded example room URL DAILY_SAMPLE_ROOM_URL = # Add your Daily and Gemini API keys DAILY_API_KEY = [your key here] GEMINI_API_KEY = [your key here] # Use Gemini implementation BOT_IMPLEMENTATION = gemini ​ Server Setup (server.py) server.py is a FastAPI server that creates the meeting room where clients and bots interact, manages bot instances, and handles client connections. It’s the orchestrator that brings everything on the server-side together. ​ Creating Meeting Room The server uses Daily’s API via a REST API helper to create rooms where clients and bots can meet. Each room is a secure space for audio/video communication: server/server.py Copy Ask AI async def create_room_and_token (): """Create a Daily room and generate access credentials.""" room = await daily_helpers[ "rest" ].create_room(DailyRoomParams()) token = await daily_helpers[ "rest" ].get_token(room.url) return room.url, token ​ Managing Bot Instances When a client connects, the server starts a new bot instance configured specifically for that room. It keeps track of running bots and ensures there’s only one bot per room: server/server.py Copy Ask AI # Start the bot process for a specific room bot_file = "bot-gemini.py" proc = subprocess.Popen([ f "python3 -m { bot_file } -u { room_url } -t { token } " ]) bot_procs[proc.pid] = (proc, room_url) ​ Connection Endpoints The server provides two ways to connect: Browser Access (/) Creates a room, starts a bot, and redirects the browser to the Daily meeting URL. Perfect for quick testing and development. RTVI Client (/connect) Creates a room, starts a bot, and returns connection credentials. Used by RTVI clients for custom implementations. ​ Bot Implementation (bot-gemini.py) The bot implementation connects all the pieces: Daily transport, Gemini service, conversation context, and processors. Let’s break down each component: ​ Transport Setup First, we configure the Daily transport, which handles WebRTC communication between the client and server. server/bot-gemini.py Copy Ask AI transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_in_enabled = True , # Enable audio input audio_out_enabled = True , # Enable audio output video_out_enabled = True , # Enable video output vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) Gemini Multimodal Live audio requirements: Input: 16 kHz sample rate Output: 24 kHz sample rate ​ Gemini Service Configuration Next, we initialize the Gemini service which will provide speech-to-speech inference and communication: server/bot-gemini.py Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GEMINI_API_KEY" ), voice_id = "Puck" , # Choose your bot's voice params = InputParams( temperature = 0.7 ) # Set model input params ) ​ Conversation Context We give our bot its personality and initial instructions: server/bot-gemini.py Copy Ask AI messages = [{ "role" : "user" , "content" : """You are Chatbot, a friendly, helpful robot. Keep responses brief and avoid special characters since output will be converted to audio.""" }] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) OpenAILLMContext is used as a common LLM base service for context management. In the future, we may add a specific context manager for Gemini. The context aggregator automatically maintains conversation history, helping the bot remember previous interactions. ​ Processor Setup We initialize two additional processors in our pipeline to handle different aspects of the interaction: RTVI Processors RTVIProcessor : Handles all client communication events including transcriptions, speaking states, and performance metrics Animation TalkingAnimation : Controls the bot’s visual state, switching between static and animated frames based on speaking status Learn more about the RTVI framework and available processors. ​ Pipeline Assembly Finally, we bring everything together in a pipeline: server/bot-gemini.py Copy Ask AI pipeline = Pipeline([ transport.input(), # Receive media rtvi, # Client UI events context_aggregator.user(), # Process user context llm, # Gemini processing ta, # Animation (talking/quiet states) transport.output(), # Send media context_aggregator.assistant() # Process bot context ]) task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), observers = [RTVIObserver(rtvi)], ) The order of processors is crucial! For example, the RTVI processor should be early in the pipeline to capture all relevant events. The RTVIObserver monitors the entire pipeline and automatically collects relevant events to send to the client. ​ Client Implementation Our React client uses the Pipecat React SDK to communicate with the bot. Let’s explore how the client connects and interacts with our Pipecat server. ​ Connection Setup The client needs to connect to our bot server using the same transport type (Daily WebRTC) that we configured on the server: examples/react/src/providers/PipecatProvider.tsx Copy Ask AI const client = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , // Enable audio input enableCam: false , // Disable video input enableScreenShare: false , // Disable screen sharing }); client . connect ({ endpoint: "http://localhost:7860/connect" , // Your bot connection endpoint }); The connection configuration must match your server: DailyTransport : Matches the WebRTC transport used in bot-gemini.py connect endpoint: Matches the /connect route in server.py Media settings: Controls which devices are enabled on join ​ Media Handling Pipecat’s React components handle all the complex media stream management for you: Copy Ask AI function App () { return ( < PipecatClientProvider client = { client } > < div className = "app" > < PipecatClientVideo participant = "bot" /> { /* Bot's video feed */ } < PipecatClientAudio /> { /* Audio input/output */ } </ div > </ PipecatClientProvider > ); } The PipecatClientProvider is the root component for providing Pipecat client context to your application. By wrapping your PipecatClientAudio and PipecatClientVideo components in this provider, they can access the client instance and receive and process the streams received from the Pipecat server. ​ Real-time Events The RTVI processors we configured in the pipeline emit events that we can handle in our client: Copy Ask AI // Listen for transcription events useRTVIClientEvent ( RTVIEvent . UserTranscript , ( data : TranscriptData ) => { if ( data . final ) { console . log ( `User said: ${ data . text } ` ); } }); // Listen for bot responses useRTVIClientEvent ( RTVIEvent . BotTranscript , ( data : BotLLMTextData ) => { console . log ( `Bot responded: ${ data . text } ` ); }); Available Events Speaking state changes Transcription updates Bot responses Connection status Performance metrics Event Usage Use these events to: Show speaking indicators Display transcripts Update UI state Monitor performance Optionally, uses callbacks to handle events in your application. Learn more in the Pipecat client docs. ​ Complete Example Here’s a basic client implementation with connection status and transcription display: Copy Ask AI function ChatApp () { return ( < PipecatClientProvider client = { client } > < div className = "app" > { /* Connection UI */ } < StatusDisplay /> < ConnectButton /> { /* Media Components */ } < BotVideo /> < PipecatClientAudio /> { /* Debug/Transcript Display */ } < DebugDisplay /> </ div > </ PipecatClientProvider > ); } Check out the example repository for a complete client implementation with styling and error handling. ​ Running the Application From the simple-chatbot directory, start the server and client to test the chatbot: ​ 1. Start the Server In one terminal: Copy Ask AI python server/server.py ​ 2. Start the Client In another terminal: Copy Ask AI cd examples/react npm install npm run dev ​ 3. Testing the Connection Open http://localhost:5173 in your browser Click “Connect” to join a room Allow microphone access when prompted Start talking with your AI assistant Troubleshooting: Check that all API keys are properly configured in .env Grant your browser access to your microphone, so it can receive your audio input Verify WebRTC ports aren’t blocked by firewalls ​ Next Steps Now that you have a working chatbot, consider these enhancements: Add custom avatar animations Implement function calling for external integrations Add support for multiple languages Enhance error recovery and reconnection logic ​ Examples Foundational Example A basic implementation demonstrating core Gemini Multimodal Live features and transcription capabilities Simple Chatbot A complete client/server implementation showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot ​ Learn More Gemini Multimodal Live API Reference React Client SDK Documentation Recording Transcripts Metrics On this page What We’ll Build Key Concepts Understanding Pipelines Processors Gemini Integration Prerequisites Server Implementation Environment Setup Server Setup (server.py) Creating Meeting Room Managing Bot Instances Connection Endpoints Bot Implementation (bot-gemini.py) Transport Setup Gemini Service Configuration Conversation Context Processor Setup Pipeline Assembly Client Implementation Connection Setup Media Handling Real-time Events Complete Example Running the Application 1. Start the Server 2. Start the Client 3. Testing the Connection Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/features_pipecat-flows_0d1a2956.txt b/features_pipecat-flows_0d1a2956.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f551fec5df633313c492d46925cf0162fcb98067
--- /dev/null
+++ b/features_pipecat-flows_0d1a2956.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/features/pipecat-flows#tips
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations ​ Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence ​ Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. ​ When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows ​ Installation If you’re already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If you’re starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google 💡 Want to design your flows visually? Try the online Flow Editor ​ Core Concepts ​ Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. ​ Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, it’s expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) ​ Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node ​ Node Structure Each node in your flow represents a conversation state and consists of three main components: ​ Messages Nodes use two types of messages to control the conversation: Role Messages : Define the bot’s personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each node’s purpose. ​ Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. ​ Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the function’s signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } ​ Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } ​ Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next node’s messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } ​ Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion ​ Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. ​ Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. ​ Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the node’s messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. ​ Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. ​ Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasn’t told them yet). While it’s always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. ​ Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: ​ Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new node’s messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation ​ Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } ​ Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current node’s purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. ​ State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Here’s a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. ​ LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: ​ OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] ​ Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] ​ Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You don’t need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. ​ Implementation Approaches ​ Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. ​ Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() ​ Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } ​ Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. ​ Example Implementation Here’s a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) ​ Best Practices Store shared data in flow_manager.state Create separate functions for node creation ​ Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. ​ Visual Design ​ Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } ​ Naming Conventions Start Node : Use descriptive names (e.g., “greeting”, “welcome”) Flow Nodes : Name based on purpose (e.g., “collect_info”, “verify_data”) End Node : Conventionally named “end” Functions : Use clear, action-oriented names ​ Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. ​ Using the Editor ​ Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node ​ Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } ​ Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/features_pipecat-flows_25b3f44a.txt b/features_pipecat-flows_25b3f44a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..27a5ea3bf6b7b4004173916f20045cc373c0d93b
--- /dev/null
+++ b/features_pipecat-flows_25b3f44a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/_sites/docs.pipecat.ai/guides/features/pipecat-flows#next-steps
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/features_pipecat-flows_dfbb21df.txt b/features_pipecat-flows_dfbb21df.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0396a7523f18f7ee1fa11ef93ae6f21212db22d7
--- /dev/null
+++ b/features_pipecat-flows_dfbb21df.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/features/pipecat-flows#timing-considerations
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Features Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat Flows provides a framework for building structured conversations in your AI applications. It enables you to create both predefined conversation paths and dynamically generated flows while handling the complexities of state management and LLM interactions. The framework consists of: A Python module for building conversation flows with Pipecat A visual editor for designing and exporting flow configurations ​ Key Concepts Nodes : Represent conversation states with specific messages and available functions Messages : Set the role and tasks for each node Functions : Define actions and transitions (Node functions for operations, Edge functions for transitions) Actions : Execute operations during state transitions (pre/post actions) State Management : Handle conversation state and data persistence ​ Example Flows Movie Explorer (Static) A static flow demonstrating movie exploration using OpenAI. Shows real API integration with TMDB, structured data collection, and state management. Insurance Policy (Dynamic) A dynamic flow using Google Gemini that adapts policy recommendations based on user responses. Demonstrates runtime node creation and conditional paths. These examples are fully functional and can be run locally. Make sure you have the required dependencies installed and API keys configured. ​ When to Use Static vs Dynamic Flows Static Flows are ideal when: Conversation structure is known upfront Paths follow predefined patterns Flow can be fully configured in advance Example: Customer service scripts, intake forms Dynamic Flows are better when: Paths depend on external data Flow structure needs runtime modification Complex decision trees are involved Example: Personalized recommendations, adaptive workflows ​ Installation If you’re already using Pipecat: Copy Ask AI pip install pipecat-ai-flows If you’re starting fresh: Copy Ask AI # Basic installation pip install pipecat-ai-flows # Install Pipecat with specific LLM provider options: pip install "pipecat-ai[daily,openai,deepgram]" # For OpenAI pip install "pipecat-ai[daily,anthropic,deepgram]" # For Anthropic pip install "pipecat-ai[daily,google,deepgram]" # For Google 💡 Want to design your flows visually? Try the online Flow Editor ​ Core Concepts ​ Designing Conversation Flows Functions in Pipecat Flows serve two key purposes: Processing data (likely by interfacing with external systems and APIs) Advancing the conversation to the next node Each function can do one or both. LLMs decide when to run each function, via their function calling (or tool calling) mechanism. ​ Defining a Function A function is expected to return a (result, next_node) tuple. More precisely, it’s expected to return: Copy Ask AI # (result, next_node) Tuple[Optional[FlowResult], Optional[Union[NodeConfig, str ]]] If the function processes data, it should return a non- None value for the first element of the tuple. This value should be a FlowResult or subclass. If the function advances the conversation to the next node, it should return a non- None value for the second element of the tuple. This value can be either: A NodeConfig defining the next node (for dynamic flows) A string identifying the next node (for static flows) ​ Example Function Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult, NodeConfig async def check_availability ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: # Read arguments date = args[ "date" ] time = args[ "time" ] # Read previously-stored data party_size = flow_manager.state.get( "party_size" ) # Use flow_manager for immediate user feedback await flow_manager.task.queue_frame(TTSSpeakFrame( "Checking our reservation system..." )) # Store data in flow state for later use flow_manager.state[ "requested_date" ] = date # Interface with reservation system is_available = await reservation_system.check_availability(date, time, party_size) # Assemble result result = { "status" : "success" , "available" : available } # Decide which node to go to next if is_available: next_node = create_confirmation_node() else : next_node = create_no_availability_node() # Return both result and next node return result, next_node ​ Node Structure Each node in your flow represents a conversation state and consists of three main components: ​ Messages Nodes use two types of messages to control the conversation: Role Messages : Define the bot’s personality or role (optional) Copy Ask AI "role_messages" : [ { "role" : "system" , "content" : "You are a friendly pizza ordering assistant. Keep responses casual and upbeat." } ] Task Messages : Define what the bot should do in the current node Copy Ask AI "task_messages" : [ { "role" : "system" , "content" : "Ask the customer which pizza size they'd like: small, medium, or large." } ] Role messages are typically defined in your initial node and inherited by subsequent nodes, while task messages are specific to each node’s purpose. ​ Functions Functions in Pipecat Flows can: Process data Specify node transitions Do both This leads to two conceptual types of functions: Node functions , which only process data. Edge functions , which also (or only) transition to the next node. The function itself ( which you can read more about here ) is usually wrapped in a function configuration, which also contains some metadata about the function. ​ Function Configuration Pipecat Flows supports three ways of specifying function configuration: Provider-specific dictionary format Copy Ask AI # Dictionary format { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } FlowsFunctionSchema Copy Ask AI # Using FlowsFunctionSchema from pipecat_flows import FlowsFunctionSchema size_function = FlowsFunctionSchema( name = "select_size" , description = "Select pizza size" , properties = { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} }, required = [ "size" ], handler = select_size ) # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [size_function] } The FlowsFunctionSchema approach provides some advantages over the provider-specific dictionary format: Consistent structure across LLM providers Simplified parameter definition Cleaner, more readable code Both dictionary and FlowsFunctionSchema approaches are fully supported. FlowsFunctionSchema is recommended for new projects as it provides better type checking and a provider-independent format. Direct function usage (auto-configuration) This approach lets you bypass specifying a standalone function configuration. Instead, relevant function metadata is automatically extracted from the function’s signature and docstring: name description properties (including individual property description s) required Note that the function signature is a bit different when using direct functions. The first parameter is the FlowManager , followed by any others necessary for the function. Copy Ask AI from pipecat_flows import FlowManager, FlowResult async def select_pizza_order ( flow_manager : FlowManager, size : str , pizza_type : str , additional_toppings : list[ str ] = [], ) -> tuple[FlowResult, str ]: """ Record the pizza order details. Args: size (str): Size of the pizza. Must be one of "small", "medium", or "large". pizza_type (str): Type of pizza. Must be one of "pepperoni", "cheese", "supreme", or "vegetarian". additional_toppings (list[str]): List of additional toppings. Defaults to empty list. """ ... # Use in node configuration node_config = { "task_messages" : [ ... ], "functions" : [select_pizza_order] } ​ Node Functions Functions that process data within a single conversational state, without switching nodes. When called, they: Execute their handler to do the data processing (typically by interfacing with an external system or API) Trigger an immediate LLM completion with the result Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, None ]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) return { "status" : "success" , "size" : size }, None # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } ​ Edge Functions Functions that specify a transition between nodes (optionally processing data first). When called, they: Execute their handler to do any data processing (optional) and determine the next node Add the function result to the LLM context Trigger LLM completion after both the function result and the next node’s messages are in the context Copy Ask AI from pipecat_flows import FlowArgs, FlowManager, FlowResult async def select_size ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Process pizza size selection.""" size = args[ "size" ] await ordering_system.record_size_selection(size) result = { "status" : "success" , "size" : size } next_node = create_confirmation_node() return result, next_node # Function configuration { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } }, } } ​ Actions Actions are operations that execute as part of the lifecycle of a node, with two distinct timing options: Pre-actions: execute when entering the node, before the LLM completion Post-actions: execute after the LLM completion ​ Pre-Actions Execute when entering the node, before LLM inference. Useful for: Providing immediate feedback while waiting for LLM responses Bridging gaps during longer function calls Setting up state or context Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." # Immediate feedback during processing } ], Note that when the node is configured with respond_immediately: False , the pre_actions still run when entering the node, which may be well before LLM inference, depending on how long the user takes to speak first. Avoid mixing tts_say actions with chat completions as this may result in a conversation flow that feels unnatural. tts_say are best used as filler words when the LLM will take time to generate an completion. ​ Post-Actions Execute after LLM inference completes. Useful for: Cleanup operations State finalization Ensuring proper sequence of operations Copy Ask AI "post_actions" : [ { "type" : "end_conversation" # Ensures TTS completes before ending } ] Note that when the node is configured with respond_immediately: False , the post_actions still only run after the first LLM inference, which may be a while depending on how long the user takes to speak first. ​ Timing Considerations Pre-actions : Execute immediately, before any LLM processing begins LLM Inference : Processes the node’s messages and functions Post-actions : Execute after LLM processing and TTS completion For example, when using end_conversation as a post-action, the sequence is: LLM generates response TTS speaks the response End conversation action executes This ordering ensures proper completion of all operations. ​ Action Types Flows comes equipped with pre-canned actions and you can also define your own action behavior. See the reference docs for more information. ​ Deciding Who Speaks First For each node in the conversation, you can decide whether the LLM should respond immediately upon entering the node (the default behavior) or whether the LLM should wait for the user to speak first before responding. You do this using the respond_immediately field. respond_immediately=False may be particularly useful in the very first node, especially in outbound-calling cases where the user has to first answer the phone to trigger the conversation. Copy Ask AI NodeConfig( task_messages = [ { "role" : "system" , "content" : "Warmly greet the customer and ask how many people are in their party. This is your only job for now; if the customer asks for something else, politely remind them you can't do it." , } ], respond_immediately = False , # ... other fields ) Keep in mind that if you specify respond_immediately=False , the user may not be aware of the conversational task at hand when entering the node (the bot hasn’t told them yet). While it’s always important to have guardrails in your node messages to keep the conversation on topic, letting the user speak first makes it even more so. ​ Context Management Pipecat Flows provides three strategies for managing conversation context during node transitions: ​ Context Strategies APPEND (default): Adds new messages to the existing context, maintaining the full conversation history RESET : Clears the context and starts fresh with the new node’s messages RESET_WITH_SUMMARY : Resets the context but includes an AI-generated summary of the previous conversation ​ Configuration Context strategies can be configured globally or per-node: Copy Ask AI from pipecat_flows import ContextStrategy, ContextStrategyConfig # Global strategy configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, context_strategy = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far, focusing on decisions made and important information collected." ) ) # Per-node strategy configuration node_config = { "task_messages" : [ ... ], "functions" : [ ... ], "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Provide a concise summary of the customer's order details and preferences." ) } ​ Strategy Selection Choose your strategy based on your conversation needs: Use APPEND when full conversation history is important Use RESET when previous context might confuse the current node’s purpose Use RESET_WITH_SUMMARY for long conversations where key points need to be preserved When using RESET_WITH_SUMMARY, if summary generation fails or times out, the system automatically falls back to RESET strategy for resilience. ​ State Management The state variable in FlowManager is a shared dictionary that persists throughout the conversation. Think of it as a conversation memory that lets you: Store user information Track conversation progress Share data between nodes Inform decision-making Here’s a practical example of a pizza ordering flow: Copy Ask AI # Store user choices as they're made async def select_size ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle pizza size selection.""" size = args[ "size" ] # Initialize order in state if it doesn't exist if "order" not in flow_manager.state: flow_manager.state[ "order" ] = {} # Store the selection flow_manager.state[ "order" ][ "size" ] = size return { "status" : "success" , "size" : size}, "toppings" async def select_toppings ( args : FlowArgs) -> tuple[FlowResult, str ]: """Handle topping selection.""" topping = args[ "topping" ] # Get existing order and toppings order = flow_manager.state.get( "order" , {}) toppings = order.get( "toppings" , []) # Add new topping toppings.append(topping) order[ "toppings" ] = toppings flow_manager.state[ "order" ] = order return { "status" : "success" , "toppings" : toppings}, "finalize" async def finalize_order ( args : FlowArgs) -> tuple[FlowResult, str ]: """Process the complete order.""" order = flow_manager.state.get( "order" , {}) # Validate order has required information if "size" not in order: return { "status" : "error" , "error" : "No size selected" } # Calculate price based on stored selections size = order[ "size" ] toppings = order.get( "toppings" , []) price = calculate_price(size, len (toppings)) return { "status" : "success" , "summary" : f "Ordered: { size } pizza with { ', ' .join(toppings) } " , "price" : price }, "end" In this example: select_size initializes the order and stores the size select_toppings builds a list of toppings finalize_order uses the stored information to process the complete order The state variable makes it easy to: Build up information across multiple interactions Access previous choices when needed Validate the complete order Calculate final results This is particularly useful when information needs to be collected across multiple conversation turns or when later decisions depend on earlier choices. ​ LLM Provider Support Pipecat Flows automatically handles format differences between LLM providers: ​ OpenAI Format Copy Ask AI "functions" : [{ "type" : "function" , "function" : { "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } } }] ​ Anthropic Format Copy Ask AI "functions" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "input_schema" : { ... } }] ​ Google (Gemini) Format Copy Ask AI "functions" : [{ "function_declarations" : [{ "name" : "function_name" , "handler" : select_size, "description" : "description" , "parameters" : { ... } }] }] You don’t need to handle these differences manually - Pipecat Flows adapts your configuration to the correct format based on your LLM provider. ​ Implementation Approaches ​ Static Flows Static flows use a configuration-driven approach where the entire conversation structure is defined upfront. ​ Basic Setup Copy Ask AI from pipecat_flows import FlowManager # Define flow configuration flow_config = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } } } # Initialize flow manager with static configuration flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize() ​ Example FlowConfig Copy Ask AI flow_config = { "initial_node" : "start" , "nodes" : { "start" : { "role_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. You must ALWAYS use the available functions to progress the conversation. This is a phone conversation and your responses will be converted to audio. Keep the conversation friendly, casual, and polite. Avoid outputting special characters and emojis." , } ], "task_messages" : [ { "role" : "system" , "content" : "You are an order-taking assistant. Ask if they want pizza or sushi." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "choose_pizza" , "handler" : choose_pizza, # Returns [None, "pizza_order"] "description" : "User wants pizza" , "parameters" : { "type" : "object" , "properties" : {}} } } ] }, "pizza_order" : { "task_messages" : [ ... ], "functions" : [ { "type" : "function" , "function" : { "name" : "select_size" , "handler" : select_size, # Returns [FlowResult, "toppings"] "description" : "Select pizza size" , "parameters" : { "type" : "object" , "properties" : { "size" : { "type" : "string" , "enum" : [ "small" , "medium" , "large" ]} } } } } ] } } } ​ Dynamic Flows Dynamic flows create and modify conversation paths at runtime based on data or business logic. ​ Example Implementation Here’s a complete example of a dynamic insurance quote flow: Copy Ask AI from pipecat_flows import FlowManager, FlowArgs, FlowResult # Define handlers and transitions async def collect_age ( args : FlowArgs, flow_manager : FlowManager) -> tuple[AgeResult, NodeConfig]: """Process age collection.""" age = args[ "age" ] # Assemble result result = AgeResult( status = "success" , age = age) # Decide which node to go to next if age < 25 : await flow_manager.set_node_from_config(create_young_adult_node()) else : await flow_manager.set_node_from_config(create_standard_node()) return result, age # Node creation functions def create_initial_node () -> NodeConfig: """Create initial age collection node.""" return { "name" : "initial" , "role_messages" : [ { "role" : "system" , "content" : "You are an insurance quote assistant." } ], "task_messages" : [ { "role" : "system" , "content" : "Ask for the customer's age." } ], "functions" : [ { "type" : "function" , "function" : { "name" : "collect_age" , "handler" : collect_age, "description" : "Collect customer age" , "parameters" : { "type" : "object" , "properties" : { "age" : { "type" : "integer" } } } } } ] } def create_young_adult_node () -> Dict[ str , Any]: """Create node for young adult quotes.""" return { "name" : "young_adult" , "task_messages" : [ { "role" : "system" , "content" : "Explain our special young adult coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } def create_standard_node () -> Dict[ str , Any]: """Create node for standard quotes.""" return { "name" : "standard" , "task_messages" : [ { "role" : "system" , "content" : "Present our standard coverage options." } ], "functions" : [ ... ] # Additional quote-specific functions } # Initialize flow manager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, ) @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): await transport.capture_participant_transcription(participant[ "id" ]) await flow_manager.initialize(create_initial_node()) ​ Best Practices Store shared data in flow_manager.state Create separate functions for node creation ​ Flow Editor The Pipecat Flow Editor provides a visual interface for creating and managing conversation flows. It offers a node-based interface that makes it easier to design, visualize, and modify your flows. ​ Visual Design ​ Node Types Start Node (Green): Entry point of your flow Copy Ask AI "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ] } Flow Nodes (Blue): Intermediate states Copy Ask AI "collect_info" : { "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] } End Node (Red): Final state Copy Ask AI "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [{ "type" : "end_conversation" }] } Function Nodes : Edge Functions (Purple): Create transitions Copy Ask AI { "name" : "next_node" , "description" : "Transition to next state" } Node Functions (Orange): Perform operations Copy Ask AI { "name" : "process_data" , "handler" : process_data_handler, "description" : "Process user data" } ​ Naming Conventions Start Node : Use descriptive names (e.g., “greeting”, “welcome”) Flow Nodes : Name based on purpose (e.g., “collect_info”, “verify_data”) End Node : Conventionally named “end” Functions : Use clear, action-oriented names ​ Function Configuration Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_handler, "description" : "Process user data" , "parameters" : { ... } } } When using the Flow Editor, function handlers can be specified using the __function__: token: Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : "__function__:process_data" , # References function in main script "description" : "Process user data" , "parameters" : { ... } } } The handler will be looked up in your main script when the flow is executed. When function handlers are specified in the flow editor, they will be exported with the __function__: token. ​ Using the Editor ​ Creating a New Flow Start with a descriptively named Start Node Add Flow Nodes for each conversation state Connect nodes using Edge Functions Add Node Functions for operations Include an End Node ​ Import/Export Copy Ask AI # Export format { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ ... ], "task_messages" : [ ... ], "functions" : [ ... ], "pre_actions" : [ ... ] }, "process" : { "task_messages" : [ ... ], "functions" : [ ... ], }, "end" : { "task_messages" : [ ... ], "functions" : [], "post_actions" : [ ... ] } } } ​ Tips Use the visual preview to verify flow logic Test exported configurations Document node purposes and transitions Keep flows modular and maintainable Try the editor at flows.pipecat.ai OpenAI Audio Models and APIs Overview On this page Key Concepts Example Flows When to Use Static vs Dynamic Flows Installation Core Concepts Designing Conversation Flows Defining a Function Example Function Node Structure Messages Functions Function Configuration Node Functions Edge Functions Actions Pre-Actions Post-Actions Timing Considerations Action Types Deciding Who Speaks First Context Management Context Strategies Configuration Strategy Selection State Management LLM Provider Support OpenAI Format Anthropic Format Google (Gemini) Format Implementation Approaches Static Flows Basic Setup Example FlowConfig Dynamic Flows Example Implementation Best Practices Flow Editor Visual Design Node Types Naming Conventions Function Configuration Using the Editor Creating a New Flow Import/Export Tips Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/filters_frame-filter_e36df470.txt b/filters_frame-filter_e36df470.txt
new file mode 100644
index 0000000000000000000000000000000000000000..976892d6a09a6d5e18468902162fcb2eb2be9636
--- /dev/null
+++ b/filters_frame-filter_e36df470.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/filters/frame-filter
+Title: FrameFilter - Pipecat
+==================================================
+
+FrameFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters FrameFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview FrameFilter is a processor that filters frames based on their types, only passing through frames that match specified types (plus some system frames like EndFrame and SystemFrame ). ​ Constructor Parameters ​ types Tuple[Type[Frame], ...] required Tuple of frame types that should be passed through the filter ​ Functionality When a frame passes through the filter, it is checked against the provided types. Only frames that match one of the specified types (or are system frames) will be passed downstream. All other frames are dropped. ​ Output Frames The processor always passes through: Frames matching any of the specified types EndFrame and SystemFrame instances (always allowed, so as to not block the pipeline) ​ Usage Example Copy Ask AI from pipecat.frames.frames import TextFrame, AudioRawFrame, Frame from pipecat.processors.filters import FrameFilter from typing import Tuple, Type # Create a filter that only passes TextFrames and AudioRawFrames text_and_audio_filter = FrameFilter( types = (TextFrame, AudioRawFrame) ) # Add to pipeline pipeline = Pipeline([ source, text_and_audio_filter, # Filters out all other frame types destination ]) ​ Frame Flow ​ Notes Simple but powerful way to restrict which frame types flow through parts of your pipeline Always allows system frames to pass through for proper pipeline operation Can be used to isolate specific parts of your pipeline from certain frame types Efficient implementation with minimal overhead SoundfileMixer FunctionFilter On this page Overview Constructor Parameters Functionality Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/filters_function-filter_29dcd0a0.txt b/filters_function-filter_29dcd0a0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9f1338e9f726292f388d870eb9eeb5be960a7eea
--- /dev/null
+++ b/filters_function-filter_29dcd0a0.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/filters/function-filter#functionality
+Title: FunctionFilter - Pipecat
+==================================================
+
+FunctionFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters FunctionFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview FunctionFilter is a flexible processor that uses a custom async function to determine which frames to pass through. This allows for complex, dynamic filtering logic beyond simple type checking. ​ Constructor Parameters ​ filter Callable[[Frame], Awaitable[bool]] required Async function that examines each frame and returns True to allow it or False to filter it out ​ direction FrameDirection default: "FrameDirection.DOWNSTREAM" Which direction of frames to filter (DOWNSTREAM or UPSTREAM) ​ Functionality When a frame passes through the processor: System frames and end frames are always passed through Frames moving in a different direction than specified are always passed through Other frames are passed to the filter function If the filter function returns True, the frame is passed through ​ Output Frames The processor conditionally passes through frames based on: Frame type (system frames and end frames always pass) Frame direction (only filters in the specified direction) Result of the custom filter function ​ Usage Example Copy Ask AI from pipecat.frames.frames import TextFrame, Frame from pipecat.processors.filters import FunctionFilter from pipecat.processors.frame_processor import FrameDirection # Create filter that only allows TextFrames with more than 10 characters async def long_text_filter ( frame : Frame) -> bool : if isinstance (frame, TextFrame): return len (frame.text) > 10 return False # Apply filter to downstream frames only text_length_filter = FunctionFilter( filter = long_text_filter, direction = FrameDirection. DOWNSTREAM ) # Add to pipeline pipeline = Pipeline([ source, text_length_filter, # Filters out short text frames destination ]) ​ Frame Flow ​ Notes Provides maximum flexibility for complex filtering logic Can incorporate dynamic conditions that change at runtime Only filters frames moving in the specified direction Always passes through system frames for proper pipeline operation Can be used to create sophisticated content-based filters Supports async filter functions for complex processing FrameFilter IdentityFilter On this page Overview Constructor Parameters Functionality Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/filters_identify-filter_7c1d9bc8.txt b/filters_identify-filter_7c1d9bc8.txt
new file mode 100644
index 0000000000000000000000000000000000000000..caccddd442bd3e9a08e127b3164ef39e75346648
--- /dev/null
+++ b/filters_identify-filter_7c1d9bc8.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/filters/identify-filter
+Title: IdentityFilter - Pipecat
+==================================================
+
+IdentityFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters IdentityFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview IdentityFilter is a simple pass-through processor that forwards all frames without any modification or filtering. It acts as a transparent layer in your pipeline, allowing all frames to flow through unchanged. Check out Observers for an option that delivers similar functionality but doesn’t require a processor to reside in the Pipeline. ​ Constructor Parameters The IdentityFilter constructor accepts no specific parameters beyond those inherited from FrameProcessor . ​ Functionality When a frame passes through the processor, it is immediately forwarded in the same direction with no changes. This applies to all frame types and both directions (upstream and downstream). ​ Use Cases While functionally equivalent to having no filter at all, IdentityFilter can be useful in several scenarios: Testing ParallelPipeline configurations to ensure frames aren’t duplicated Acting as a placeholder where a more complex filter might be added later Monitoring frame flow in pipelines by adding logging in subclasses Creating a base class for more complex conditional filters ​ Usage Example Copy Ask AI from pipecat.processors.filters import IdentityFilter # Create an identity filter pass_through = IdentityFilter() # Add to pipeline pipeline = Pipeline([ source, pass_through, # All frames pass through unchanged destination ]) ​ Frame Flow ​ Notes Simplest possible filter implementation Passes all frames through without modification Useful in testing parallel pipelines Can serve as a placeholder or base class Zero overhead in normal operation FunctionFilter NullFilter On this page Overview Constructor Parameters Functionality Use Cases Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/filters_null-filter_b7124305.txt b/filters_null-filter_b7124305.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8a7c5f874cccc7cfca6685630e036e6175c43106
--- /dev/null
+++ b/filters_null-filter_b7124305.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/filters/null-filter#use-cases
+Title: NullFilter - Pipecat
+==================================================
+
+NullFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters NullFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview NullFilter is a filtering processor that blocks all frames from passing through, with the exception of system frames and end frames which are required for proper pipeline operation. ​ Constructor Parameters The NullFilter constructor accepts no specific parameters beyond those inherited from FrameProcessor . ​ Functionality When a frame passes through the processor: If the frame is a SystemFrame or EndFrame , it is passed through All other frame types are blocked and do not continue through the pipeline This filter effectively acts as a barrier that allows only the essential system frames required for pipeline initialization, shutdown, and management. ​ Use Cases NullFilter is useful in several scenarios: Temporarily disabling parts of a pipeline without removing components Creating dead-end branches in parallel pipelines Testing pipeline behavior with blocked communication Implementing conditional pipelines where certain paths should be blocked ​ Usage Example Copy Ask AI from pipecat.processors.filters import NullFilter # Create a null filter that blocks all non-system frames blocker = NullFilter() # Add to pipeline pipeline = Pipeline([ source, blocker, # Blocks all regular frames destination # Will only receive system frames ]) ​ Frame Flow ​ Notes Blocks all regular frames in both directions Only allows system frames and end frames to pass through Useful for testing, debugging, and creating conditional pipelines Minimal overhead as it performs simple type checking Can be used to temporarily disable parts of a pipeline IdentityFilter STTMuteFilter On this page Overview Constructor Parameters Functionality Use Cases Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/filters_wake-check-filter_60e8c302.txt b/filters_wake-check-filter_60e8c302.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ae49138eaefa92600677df21492eec4730fc55d1
--- /dev/null
+++ b/filters_wake-check-filter_60e8c302.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/filters/wake-check-filter#param-wake-phrases
+Title: WakeCheckFilter - Pipecat
+==================================================
+
+WakeCheckFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters WakeCheckFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview WakeCheckFilter monitors TranscriptionFrame s for specified wake phrases and only allows frames to pass through after a wake phrase has been detected. It includes a keepalive timeout to maintain the awake state for a period after detection, allowing continuous conversation without requiring repeated wake phrases. ​ Constructor Parameters ​ wake_phrases list[str] required List of wake phrases to detect in transcriptions ​ keepalive_timeout float default: "3" Number of seconds to remain in the awake state after each transcription ​ Functionality The filter maintains state for each participant and processes frames as follows: TranscriptionFrame objects are checked for wake phrases If a wake phrase is detected, the filter enters the “AWAKE” state While in the “AWAKE” state, all transcription frames pass through After no activity for the keepalive timeout period, the filter returns to “IDLE” All non-transcription frames pass through normally Wake phrases are detected using regular expressions that match whole words with flexible spacing, making detection resilient to minor transcription variations. ​ States ​ IDLE WakeState Default state - only non-transcription frames pass through ​ AWAKE WakeState Active state after wake phrase detection - all frames pass through ​ Output Frames All non-transcription frames pass through unchanged After wake phrase detection, transcription frames pass through When awake, transcription frames reset the keepalive timer ​ Usage Example Copy Ask AI from pipecat.processors.filters import WakeCheckFilter # Create filter with wake phrases wake_filter = WakeCheckFilter( wake_phrases = [ "hey assistant" , "ok computer" , "listen up" ], keepalive_timeout = 5.0 # Stay awake for 5 seconds after each transcription ) # Add to pipeline pipeline = Pipeline([ transport.input(), stt_service, wake_filter, # Only passes transcriptions after wake phrases llm_service, tts_service, transport.output() ]) ​ Frame Flow ​ Notes Maintains separate state for each participant ID Uses regex pattern matching for resilient wake phrase detection Accumulates transcription text to detect phrases across multiple frames Trims accumulated text when wake phrase is detected Supports multiple wake phrases Passes all non-transcription frames through unchanged Error handling produces ErrorFrames for robust operation Case-insensitive matching for natural language use STTMuteFilter WakeNotifierFilter On this page Overview Constructor Parameters Functionality States Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/filters_wake-notifier-filter_c6a4a7a8.txt b/filters_wake-notifier-filter_c6a4a7a8.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c01e0a6880ed6691f333b94941b58790f1829666
--- /dev/null
+++ b/filters_wake-notifier-filter_c6a4a7a8.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/filters/wake-notifier-filter
+Title: WakeNotifierFilter - Pipecat
+==================================================
+
+WakeNotifierFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frame Filters WakeNotifierFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters FrameFilter FunctionFilter IdentityFilter NullFilter STTMuteFilter WakeCheckFilter WakeNotifierFilter Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview WakeNotifierFilter monitors the pipeline for specific frame types and triggers a notification when those frames pass a custom filter condition. It passes all frames through unchanged while performing this notification side-effect. ​ Constructor Parameters ​ notifier BaseNotifier required The notifier object to trigger when conditions are met ​ types Tuple[Type[Frame]] required Tuple of frame types to monitor ​ filter Callable[[Frame], Awaitable[bool]] required Async function that examines each matching frame and returns True to trigger notification ​ Functionality The processor operates as follows: Checks if the incoming frame matches any of the specified types If it’s a matching type, calls the filter function with the frame If the filter returns True, triggers the notifier Passes all frames through unchanged, regardless of the filtering result This allows for notification side-effects without modifying the pipeline’s data flow. ​ Output Frames All frames pass through unchanged in their original direction No frames are modified or filtered out ​ Usage Example Copy Ask AI from pipecat.frames.frames import TranscriptionFrame, UserStartedSpeakingFrame from pipecat.processors.filters import WakeNotifierFilter from pipecat.sync.event_notifier import EventNotifier # Create an event notifier wake_event = EventNotifier() # Create filter that notifies when certain wake phrases are detected async def wake_phrase_filter ( frame ): if isinstance (frame, TranscriptionFrame): return "hey assistant" in frame.text.lower() return False # Add to pipeline wake_notifier = WakeNotifierFilter( notifier = wake_event, types = (TranscriptionFrame, UserStartedSpeakingFrame), filter = wake_phrase_filter ) # In another component, wait for the notification async def handle_wake_event (): await wake_event.wait() print ( "Wake phrase detected!" ) ​ Frame Flow ​ Notes Acts as a transparent pass-through for all frames Can trigger external events without modifying pipeline flow Useful for signaling between pipeline components Can monitor for multiple frame types simultaneously Uses async filter function for complex conditions Functions as a “listener” that doesn’t affect the data stream Can be used for logging, analytics, or coordinating external systems WakeCheckFilter OpenTelemetry On this page Overview Constructor Parameters Functionality Output Frames Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/flows_pipecat-flows_013ec978.txt b/flows_pipecat-flows_013ec978.txt
new file mode 100644
index 0000000000000000000000000000000000000000..de72a52d8a067575b25c8d37d1b577100bf14371
--- /dev/null
+++ b/flows_pipecat-flows_013ec978.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-node-config
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. ​ Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows ​ Core Types ​ FlowArgs ​ FlowArgs Dict[str, Any] Type alias for function handler arguments. ​ FlowResult ​ FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields ​ status str Optional status field ​ error str Optional error message ​ FlowConfig ​ FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields ​ initial_node str required Starting node identifier ​ nodes Dict[str, NodeConfig] required Map of node names to configurations ​ NodeConfig ​ NodeConfig TypedDict Configuration for a single node in the flow. Show Fields ​ name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" ​ role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], ​ task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], ​ context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] ​ pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], ​ post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] ​ respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False ​ Function Handler Types ​ LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ ContextStrategy ​ ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values ​ APPEND str Default strategy. Adds new messages to existing context. ​ RESET str Clears context and starts fresh with new messages. ​ RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ​ ContextStrategyConfig ​ ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields ​ strategy ContextStrategy required The strategy to use for context management ​ summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ FlowsFunctionSchema ​ FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters ​ name str required Name of the function ​ description str required Description of the function’s purpose ​ properties Dict[str, Any] required Dictionary defining properties types and descriptions ​ required List[str] required List of required parameter names ​ handler Optional[FunctionHandler] Function handler to process the function call ​ transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). ​ transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) ​ FlowManager ​ FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters ​ task PipelineTask required Pipeline task for frame queueing ​ llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. ​ context_aggregator Any required Context aggregator used for pushing messages to the LLM service ​ tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. ​ flow_config Optional[FlowConfig] Optional static flow configuration ​ context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. ​ Methods ​ initialize method Initialize the flow with starting messages. Show Parameters ​ initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises ​ FlowInitializationError If initialization fails ​ set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters ​ node_id str required Identifier for the new node ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ register_action method Register a handler for a custom action type. Show Parameters ​ action_type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action ​ get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns ​ messages List[dict] List of messages in the current context Show Raises ​ FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" ​ State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 ​ Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() ​ Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } ​ Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } ​ Function Properties ​ handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } ​ transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . ​ Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node ​ Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } ​ Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } ​ Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. ​ Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . ​ tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] ​ end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] ​ Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. ​ function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] ​ Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: ​ type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] ​ Exceptions ​ FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) ​ FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) ​ FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) ​ InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/flows_pipecat-flows_114e8553.txt b/flows_pipecat-flows_114e8553.txt
new file mode 100644
index 0000000000000000000000000000000000000000..64b6004fbb041853d7f82cc5c6376f2039879590
--- /dev/null
+++ b/flows_pipecat-flows_114e8553.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#core-types
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. ​ Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows ​ Core Types ​ FlowArgs ​ FlowArgs Dict[str, Any] Type alias for function handler arguments. ​ FlowResult ​ FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields ​ status str Optional status field ​ error str Optional error message ​ FlowConfig ​ FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields ​ initial_node str required Starting node identifier ​ nodes Dict[str, NodeConfig] required Map of node names to configurations ​ NodeConfig ​ NodeConfig TypedDict Configuration for a single node in the flow. Show Fields ​ name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" ​ role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], ​ task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], ​ context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] ​ pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], ​ post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] ​ respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False ​ Function Handler Types ​ LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ ContextStrategy ​ ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values ​ APPEND str Default strategy. Adds new messages to existing context. ​ RESET str Clears context and starts fresh with new messages. ​ RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ​ ContextStrategyConfig ​ ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields ​ strategy ContextStrategy required The strategy to use for context management ​ summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ FlowsFunctionSchema ​ FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters ​ name str required Name of the function ​ description str required Description of the function’s purpose ​ properties Dict[str, Any] required Dictionary defining properties types and descriptions ​ required List[str] required List of required parameter names ​ handler Optional[FunctionHandler] Function handler to process the function call ​ transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). ​ transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) ​ FlowManager ​ FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters ​ task PipelineTask required Pipeline task for frame queueing ​ llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. ​ context_aggregator Any required Context aggregator used for pushing messages to the LLM service ​ tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. ​ flow_config Optional[FlowConfig] Optional static flow configuration ​ context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. ​ Methods ​ initialize method Initialize the flow with starting messages. Show Parameters ​ initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises ​ FlowInitializationError If initialization fails ​ set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters ​ node_id str required Identifier for the new node ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ register_action method Register a handler for a custom action type. Show Parameters ​ action_type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action ​ get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns ​ messages List[dict] List of messages in the current context Show Raises ​ FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" ​ State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 ​ Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() ​ Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } ​ Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } ​ Function Properties ​ handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } ​ transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . ​ Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node ​ Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } ​ Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } ​ Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. ​ Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . ​ tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] ​ end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] ​ Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. ​ function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] ​ Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: ​ type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] ​ Exceptions ​ FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) ​ FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) ​ FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) ​ InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/flows_pipecat-flows_9e79ea87.txt b/flows_pipecat-flows_9e79ea87.txt
new file mode 100644
index 0000000000000000000000000000000000000000..589549881aad110752e09d005f06ba41a2da5aef
--- /dev/null
+++ b/flows_pipecat-flows_9e79ea87.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-flow-transition-error
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. ​ Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows ​ Core Types ​ FlowArgs ​ FlowArgs Dict[str, Any] Type alias for function handler arguments. ​ FlowResult ​ FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields ​ status str Optional status field ​ error str Optional error message ​ FlowConfig ​ FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields ​ initial_node str required Starting node identifier ​ nodes Dict[str, NodeConfig] required Map of node names to configurations ​ NodeConfig ​ NodeConfig TypedDict Configuration for a single node in the flow. Show Fields ​ name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" ​ role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], ​ task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], ​ context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] ​ pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], ​ post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] ​ respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False ​ Function Handler Types ​ LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ ContextStrategy ​ ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values ​ APPEND str Default strategy. Adds new messages to existing context. ​ RESET str Clears context and starts fresh with new messages. ​ RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ​ ContextStrategyConfig ​ ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields ​ strategy ContextStrategy required The strategy to use for context management ​ summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ FlowsFunctionSchema ​ FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters ​ name str required Name of the function ​ description str required Description of the function’s purpose ​ properties Dict[str, Any] required Dictionary defining properties types and descriptions ​ required List[str] required List of required parameter names ​ handler Optional[FunctionHandler] Function handler to process the function call ​ transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). ​ transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) ​ FlowManager ​ FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters ​ task PipelineTask required Pipeline task for frame queueing ​ llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. ​ context_aggregator Any required Context aggregator used for pushing messages to the LLM service ​ tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. ​ flow_config Optional[FlowConfig] Optional static flow configuration ​ context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. ​ Methods ​ initialize method Initialize the flow with starting messages. Show Parameters ​ initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises ​ FlowInitializationError If initialization fails ​ set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters ​ node_id str required Identifier for the new node ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ register_action method Register a handler for a custom action type. Show Parameters ​ action_type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action ​ get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns ​ messages List[dict] List of messages in the current context Show Raises ​ FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" ​ State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 ​ Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() ​ Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } ​ Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } ​ Function Properties ​ handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } ​ transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . ​ Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node ​ Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } ​ Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } ​ Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. ​ Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . ​ tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] ​ end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] ​ Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. ​ function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] ​ Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: ​ type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] ​ Exceptions ​ FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) ​ FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) ​ FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) ​ InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/flows_pipecat-flows_a48a24ad.txt b/flows_pipecat-flows_a48a24ad.txt
new file mode 100644
index 0000000000000000000000000000000000000000..96edc7a223cfce66bbb21e833c0f4b850943bc97
--- /dev/null
+++ b/flows_pipecat-flows_a48a24ad.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-context-aggregator
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. ​ Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows ​ Core Types ​ FlowArgs ​ FlowArgs Dict[str, Any] Type alias for function handler arguments. ​ FlowResult ​ FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields ​ status str Optional status field ​ error str Optional error message ​ FlowConfig ​ FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields ​ initial_node str required Starting node identifier ​ nodes Dict[str, NodeConfig] required Map of node names to configurations ​ NodeConfig ​ NodeConfig TypedDict Configuration for a single node in the flow. Show Fields ​ name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" ​ role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], ​ task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], ​ context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] ​ pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], ​ post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] ​ respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False ​ Function Handler Types ​ LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ ContextStrategy ​ ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values ​ APPEND str Default strategy. Adds new messages to existing context. ​ RESET str Clears context and starts fresh with new messages. ​ RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ​ ContextStrategyConfig ​ ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields ​ strategy ContextStrategy required The strategy to use for context management ​ summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ FlowsFunctionSchema ​ FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters ​ name str required Name of the function ​ description str required Description of the function’s purpose ​ properties Dict[str, Any] required Dictionary defining properties types and descriptions ​ required List[str] required List of required parameter names ​ handler Optional[FunctionHandler] Function handler to process the function call ​ transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). ​ transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) ​ FlowManager ​ FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters ​ task PipelineTask required Pipeline task for frame queueing ​ llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. ​ context_aggregator Any required Context aggregator used for pushing messages to the LLM service ​ tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. ​ flow_config Optional[FlowConfig] Optional static flow configuration ​ context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. ​ Methods ​ initialize method Initialize the flow with starting messages. Show Parameters ​ initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises ​ FlowInitializationError If initialization fails ​ set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters ​ node_id str required Identifier for the new node ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ register_action method Register a handler for a custom action type. Show Parameters ​ action_type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action ​ get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns ​ messages List[dict] List of messages in the current context Show Raises ​ FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" ​ State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 ​ Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() ​ Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } ​ Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } ​ Function Properties ​ handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } ​ transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . ​ Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node ​ Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } ​ Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } ​ Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. ​ Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . ​ tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] ​ end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] ​ Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. ​ function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] ​ Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: ​ type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] ​ Exceptions ​ FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) ​ FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) ​ FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) ​ InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/flows_pipecat-flows_af1ac4ec.txt b/flows_pipecat-flows_af1ac4ec.txt
new file mode 100644
index 0000000000000000000000000000000000000000..27eb25778ee62a0025af0edbbbf9fa3e07bdf14f
--- /dev/null
+++ b/flows_pipecat-flows_af1ac4ec.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-name
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. ​ Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows ​ Core Types ​ FlowArgs ​ FlowArgs Dict[str, Any] Type alias for function handler arguments. ​ FlowResult ​ FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields ​ status str Optional status field ​ error str Optional error message ​ FlowConfig ​ FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields ​ initial_node str required Starting node identifier ​ nodes Dict[str, NodeConfig] required Map of node names to configurations ​ NodeConfig ​ NodeConfig TypedDict Configuration for a single node in the flow. Show Fields ​ name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" ​ role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], ​ task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], ​ context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] ​ pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], ​ post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] ​ respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False ​ Function Handler Types ​ LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ ContextStrategy ​ ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values ​ APPEND str Default strategy. Adds new messages to existing context. ​ RESET str Clears context and starts fresh with new messages. ​ RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ​ ContextStrategyConfig ​ ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields ​ strategy ContextStrategy required The strategy to use for context management ​ summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ FlowsFunctionSchema ​ FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters ​ name str required Name of the function ​ description str required Description of the function’s purpose ​ properties Dict[str, Any] required Dictionary defining properties types and descriptions ​ required List[str] required List of required parameter names ​ handler Optional[FunctionHandler] Function handler to process the function call ​ transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). ​ transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) ​ FlowManager ​ FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters ​ task PipelineTask required Pipeline task for frame queueing ​ llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. ​ context_aggregator Any required Context aggregator used for pushing messages to the LLM service ​ tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. ​ flow_config Optional[FlowConfig] Optional static flow configuration ​ context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. ​ Methods ​ initialize method Initialize the flow with starting messages. Show Parameters ​ initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises ​ FlowInitializationError If initialization fails ​ set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters ​ node_id str required Identifier for the new node ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ register_action method Register a handler for a custom action type. Show Parameters ​ action_type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action ​ get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns ​ messages List[dict] List of messages in the current context Show Raises ​ FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" ​ State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 ​ Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() ​ Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } ​ Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } ​ Function Properties ​ handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } ​ transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . ​ Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node ​ Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } ​ Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } ​ Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. ​ Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . ​ tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] ​ end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] ​ Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. ​ function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] ​ Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: ​ type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] ​ Exceptions ​ FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) ​ FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) ​ FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) ​ InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/flows_pipecat-flows_c46c6b6c.txt b/flows_pipecat-flows_c46c6b6c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..23f3394e0a4008b4728ecdb706a31095fd468cab
--- /dev/null
+++ b/flows_pipecat-flows_c46c6b6c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-invalid-function-error
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. ​ Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows ​ Core Types ​ FlowArgs ​ FlowArgs Dict[str, Any] Type alias for function handler arguments. ​ FlowResult ​ FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields ​ status str Optional status field ​ error str Optional error message ​ FlowConfig ​ FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields ​ initial_node str required Starting node identifier ​ nodes Dict[str, NodeConfig] required Map of node names to configurations ​ NodeConfig ​ NodeConfig TypedDict Configuration for a single node in the flow. Show Fields ​ name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" ​ role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], ​ task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], ​ context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] ​ pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], ​ post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] ​ respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False ​ Function Handler Types ​ LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ ContextStrategy ​ ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values ​ APPEND str Default strategy. Adds new messages to existing context. ​ RESET str Clears context and starts fresh with new messages. ​ RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ​ ContextStrategyConfig ​ ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields ​ strategy ContextStrategy required The strategy to use for context management ​ summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ FlowsFunctionSchema ​ FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters ​ name str required Name of the function ​ description str required Description of the function’s purpose ​ properties Dict[str, Any] required Dictionary defining properties types and descriptions ​ required List[str] required List of required parameter names ​ handler Optional[FunctionHandler] Function handler to process the function call ​ transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). ​ transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) ​ FlowManager ​ FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters ​ task PipelineTask required Pipeline task for frame queueing ​ llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. ​ context_aggregator Any required Context aggregator used for pushing messages to the LLM service ​ tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. ​ flow_config Optional[FlowConfig] Optional static flow configuration ​ context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. ​ Methods ​ initialize method Initialize the flow with starting messages. Show Parameters ​ initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises ​ FlowInitializationError If initialization fails ​ set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters ​ node_id str required Identifier for the new node ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ register_action method Register a handler for a custom action type. Show Parameters ​ action_type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action ​ get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns ​ messages List[dict] List of messages in the current context Show Raises ​ FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" ​ State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 ​ Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() ​ Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } ​ Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } ​ Function Properties ​ handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } ​ transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . ​ Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node ​ Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } ​ Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } ​ Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. ​ Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . ​ tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] ​ end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] ​ Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. ​ function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] ​ Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: ​ type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] ​ Exceptions ​ FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) ​ FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) ​ FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) ​ InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/flows_pipecat-flows_cb70bbfa.txt b/flows_pipecat-flows_cb70bbfa.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1d36da77f0c50315381dbfa4da0764a20dbbc4d7
--- /dev/null
+++ b/flows_pipecat-flows_cb70bbfa.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#param-context-strategy-config
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. ​ Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows ​ Core Types ​ FlowArgs ​ FlowArgs Dict[str, Any] Type alias for function handler arguments. ​ FlowResult ​ FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields ​ status str Optional status field ​ error str Optional error message ​ FlowConfig ​ FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields ​ initial_node str required Starting node identifier ​ nodes Dict[str, NodeConfig] required Map of node names to configurations ​ NodeConfig ​ NodeConfig TypedDict Configuration for a single node in the flow. Show Fields ​ name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" ​ role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], ​ task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], ​ context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] ​ pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], ​ post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] ​ respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False ​ Function Handler Types ​ LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ ContextStrategy ​ ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values ​ APPEND str Default strategy. Adds new messages to existing context. ​ RESET str Clears context and starts fresh with new messages. ​ RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ​ ContextStrategyConfig ​ ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields ​ strategy ContextStrategy required The strategy to use for context management ​ summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ FlowsFunctionSchema ​ FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters ​ name str required Name of the function ​ description str required Description of the function’s purpose ​ properties Dict[str, Any] required Dictionary defining properties types and descriptions ​ required List[str] required List of required parameter names ​ handler Optional[FunctionHandler] Function handler to process the function call ​ transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). ​ transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) ​ FlowManager ​ FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters ​ task PipelineTask required Pipeline task for frame queueing ​ llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. ​ context_aggregator Any required Context aggregator used for pushing messages to the LLM service ​ tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. ​ flow_config Optional[FlowConfig] Optional static flow configuration ​ context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. ​ Methods ​ initialize method Initialize the flow with starting messages. Show Parameters ​ initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises ​ FlowInitializationError If initialization fails ​ set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters ​ node_id str required Identifier for the new node ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ register_action method Register a handler for a custom action type. Show Parameters ​ action_type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action ​ get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns ​ messages List[dict] List of messages in the current context Show Raises ​ FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" ​ State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 ​ Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() ​ Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } ​ Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } ​ Function Properties ​ handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } ​ transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . ​ Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node ​ Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } ​ Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } ​ Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. ​ Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . ​ tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] ​ end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] ​ Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. ​ function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] ​ Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: ​ type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] ​ Exceptions ​ FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) ​ FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) ​ FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) ​ InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/flows_pipecat-flows_f40c2f22.txt b/flows_pipecat-flows_f40c2f22.txt
new file mode 100644
index 0000000000000000000000000000000000000000..09a8c8b97f0dddad93e5b2578bc9ca49c4a95023
--- /dev/null
+++ b/flows_pipecat-flows_f40c2f22.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/flows/pipecat-flows#actions
+Title: Pipecat Flows - Pipecat
+==================================================
+
+Pipecat Flows - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Frameworks Pipecat Flows Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline New to building conversational flows? Check out our Pipecat Flows guide first. ​ Installation Existing Pipecat installation Fresh Pipecat installation Copy Ask AI pip install pipecat-ai-flows ​ Core Types ​ FlowArgs ​ FlowArgs Dict[str, Any] Type alias for function handler arguments. ​ FlowResult ​ FlowResult TypedDict Base type for function handler results. Additional fields can be included as needed. Show Fields ​ status str Optional status field ​ error str Optional error message ​ FlowConfig ​ FlowConfig TypedDict Configuration for the entire conversation flow. Show Fields ​ initial_node str required Starting node identifier ​ nodes Dict[str, NodeConfig] required Map of node names to configurations ​ NodeConfig ​ NodeConfig TypedDict Configuration for a single node in the flow. Show Fields ​ name str The name of the node, used in debug logging in dynamic flows. If no name is specified, an automatically-generated UUID is used. Copy Ask AI # Example name "name" : "greeting" ​ role_messages List[dict] Defines the role or persona of the LLM. Required for the initial node and optional for subsequent nodes. Copy Ask AI # Example role messages "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant..." } ], ​ task_messages List[dict] required Defines the task for a given node. Required for all nodes. Copy Ask AI # Example task messages "task_messages" : [ { "role" : "system" , # May be `user` depending on the LLM "content" : "Ask the user for their name..." } ], ​ context_strategy ContextStrategyConfig Strategy for managing context during transitions to this node. Copy Ask AI # Example context strategy configuration "context_strategy" : ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ functions List[Union[dict, FlowsFunctionSchema]] required LLM function / tool call configurations, defined in one of the supported formats . Copy Ask AI # Using provider-specific dictionary format "functions" : [ { "type" : "function" , "function" : { "name" : "get_current_movies" , "handler" : get_movies, "description" : "Fetch movies currently playing" , "parameters" : { ... } }, } ] # Using FlowsFunctionSchema "functions" : [ FlowsFunctionSchema( name = "get_current_movies" , description = "Fetch movies currently playing" , properties = { ... }, required = [ ... ], handler = get_movies ) ] # Using direct functions (auto-configuration) "functions" : [get_movies] ​ pre_actions List[dict] Actions that execute before the LLM inference. For example, you can send a message to the TTS to speak a phrase (e.g. “Hold on a moment…”), which may be effective if an LLM function call takes time to execute. Copy Ask AI # Example pre_actions "pre_actions" : [ { "type" : "tts_say" , "text" : "Hold on a moment..." } ], ​ post_actions List[dict] Actions that execute after the LLM inference. For example, you can end the conversation. Copy Ask AI # Example post_actions "post_actions" : [ { "type" : "end_conversation" } ] ​ respond_immediately bool If set to False , the LLM will not respond immediately when the node is set, but will instead wait for the user to speak first before responding. Defaults to True . Copy Ask AI # Example usage "respond_immediately" : False ​ Function Handler Types ​ LegacyFunctionHandler Callable[[FlowArgs], Awaitable[FlowResult | ConsolidatedFunctionResult]] Legacy function handler that only receives arguments. Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ FlowFunctionHandler Callable[[FlowArgs, FlowManager], Awaitable[FlowResult | ConsolidatedFunctionResult]] Modern function handler that receives both arguments and FlowManager . Returns either: A FlowResult (⚠️ deprecated) A “consolidated” result tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ DirectFunction DirectFunction Function that is meant to be passed directly into a NodeConfig rather than into the handler field of a function configuration. It must be an async function with flow_manager: FlowManager as its first parameter. It must return a ConsolidatedFunctionResult , which is a tuple (result, next node) where: result is an optional FlowResult next node is an optional NodeConfig (for dynamic flows) or string (for static flows) ​ ContextStrategy ​ ContextStrategy Enum Strategy for managing conversation context during node transitions. Show Values ​ APPEND str Default strategy. Adds new messages to existing context. ​ RESET str Clears context and starts fresh with new messages. ​ RESET_WITH_SUMMARY str Resets context but includes an AI-generated summary. ​ ContextStrategyConfig ​ ContextStrategyConfig dataclass Configuration for context management strategy. Show Fields ​ strategy ContextStrategy required The strategy to use for context management ​ summary_prompt Optional[str] Required when using RESET_WITH_SUMMARY. Prompt text for generating the conversation summary. Copy Ask AI # Example usage config = ContextStrategyConfig( strategy = ContextStrategy. RESET_WITH_SUMMARY , summary_prompt = "Summarize the key points discussed so far." ) ​ FlowsFunctionSchema ​ FlowsFunctionSchema class A standardized schema for defining functions in Pipecat Flows with flow-specific properties. Show Constructor Parameters ​ name str required Name of the function ​ description str required Description of the function’s purpose ​ properties Dict[str, Any] required Dictionary defining properties types and descriptions ​ required List[str] required List of required parameter names ​ handler Optional[FunctionHandler] Function handler to process the function call ​ transition_to Optional[str] deprecated Target node to transition to after function execution Deprecated: instead of transition_to , use a “consolidated” handler that returns a tuple (result, next node). ​ transition_callback Optional[Callable] deprecated Callback function for dynamic transitions Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). You cannot specify both transition_to and transition_callback in the same function schema. Example usage: Copy Ask AI from pipecat_flows import FlowsFunctionSchema # Define a function schema collect_name_schema = FlowsFunctionSchema( name = "collect_name" , description = "Record the user's name" , properties = { "name" : { "type" : "string" , "description" : "The user's name" } }, required = [ "name" ], handler = collect_name_handler ) # Use in node configuration node_config = { "name" : "greeting" , "task_messages" : [ { "role" : "system" , "content" : "Ask the user for their name." } ], "functions" : [collect_name_schema] } # Pass to flow manager await flow_manager.set_node_from_config(node_config) ​ FlowManager ​ FlowManager class Main class for managing conversation flows, supporting both static (configuration-driven) and dynamic (runtime-determined) flows. Show Constructor Parameters ​ task PipelineTask required Pipeline task for frame queueing ​ llm LLMService required LLM service instance (OpenAI, Anthropic, or Google). Must be initialized with the corresponding pipecat-ai provider dependency installed. ​ context_aggregator Any required Context aggregator used for pushing messages to the LLM service ​ tts Optional[Any] deprecated Optional TTS service for voice actions. Deprecated: No need to explicitly pass tts to FlowManager in order to use tts_say actions. ​ flow_config Optional[FlowConfig] Optional static flow configuration ​ context_strategy Optional[ContextStrategyConfig] Optional configuration for how context should be managed during transitions. Defaults to APPEND strategy if not specified. ​ Methods ​ initialize method Initialize the flow with starting messages. Show Parameters ​ initial_node NodeConfig The initial conversation node (needed for dynamic flows only). If not specified, you’ll need to call set_node_from_config() to kick off the conversation. Show Raises ​ FlowInitializationError If initialization fails ​ set_node method deprecated Set up a new conversation node programmatically (dynamic flows only). In dynamic flows, the application advances the conversation using set_node to set up each next node. In static flows, set_node is triggered under the hood when a node contains a transition_to field. Deprecated: use the following patterns instead of set_node : Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() If you really need to set a node explicitly, use set_node_from_config() (note: its name will be read from its NodeConfig ) Show Parameters ​ node_id str required Identifier for the new node ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ set_node_from_config method Set up a new conversation node programmatically (dynamic flows only). Note that this method should only be used in rare circumstances. Most often, you should: Prefer “consolidated” function handlers that return a tuple (result, next node), which implicitly sets up the next node Prefer passing your initial node to FlowManager.initialize() Show Parameters ​ node_config NodeConfig required Node configuration including messages, functions, and actions Show Raises ​ FlowError If node setup fails ​ register_action method Register a handler for a custom action type. Show Parameters ​ action_type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action ​ get_current_context method Get the current conversation context. Returns a list of messages in the current context, including system messages, user messages, and assistant responses. Show Returns ​ messages List[dict] List of messages in the current context Show Raises ​ FlowError If context aggregator is not available Example usage: Copy Ask AI # Access current conversation context context = flow_manager.get_current_context() # Use in handlers async def process_response ( args : FlowArgs) -> tuple[FlowResult, str ]: context = flow_manager.get_current_context() # Process conversation history return { "status" : "success" }, "next" ​ State Management The FlowManager provides a state dictionary for storing conversation data: Access state Access in transitions Copy Ask AI flow_manager.state: Dict[ str , Any] # Store data flow_manager.state[ "user_age" ] = 25 ​ Usage Examples Static Flow Dynamic Flow Copy Ask AI flow_config: FlowConfig = { "initial_node" : "greeting" , "nodes" : { "greeting" : { "role_messages" : [ { "role" : "system" , "content" : "You are a helpful assistant. Your responses will be converted to audio." } ], "task_messages" : [ { "role" : "system" , "content" : "Start by greeting the user and asking for their name." } ], "functions" : [{ "type" : "function" , "function" : { "name" : "collect_name" , "handler" : collect_name_handler, "description" : "Record user's name" , "parameters" : { ... } } }] } } } # Create and initialize the FlowManager flow_manager = FlowManager( task = task, llm = llm, context_aggregator = context_aggregator, flow_config = flow_config ) # Initialize the flow_manager to start the conversation await flow_manager.initialize() ​ Node Functions concept Functions that execute operations within a single conversational state, without switching nodes. Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def process_data ( args : FlowArgs) -> tuple[FlowResult, None ]: """Handle data processing within a node.""" data = args[ "data" ] result = await process(data) return { "status" : "success" , "processed_data" : result }, None # Function configuration { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, "description" : "Process user data" , "parameters" : { "type" : "object" , "properties" : { "data" : { "type" : "string" } } } } } ​ Edge Functions concept Functions that specify a transition between nodes (optionally processing data first). Copy Ask AI Copy Ask AI from pipecat_flows import FlowArgs, FlowResult async def next_step ( args : FlowArgs) -> tuple[ None , str ]: """Specify the next node to transition to.""" return None , "target_node" # Return NodeConfig instead of str for dynamic flows # Function configuration { "type" : "function" , "function" : { "name" : "next_step" , "handler" : next_step, "description" : "Transition to next node" , "parameters" : { "type" : "object" , "properties" : {}} } } ​ Function Properties ​ handler Optional[Callable] Async function that processes data within a node and/or specifies the next node ( more details here ). Can be specified as: Direct function reference Either a Callable function or a string with __function__: prefix (e.g., "__function__:process_data" ) to reference a function in the main script Direct Reference Function Token Copy Ask AI { "type" : "function" , "function" : { "name" : "process_data" , "handler" : process_data, # Callable function "parameters" : { ... } } } ​ transition_callback Optional[Callable] deprecated Handler for dynamic flow transitions. Deprecated: instead of transition_callback , use a “consolidated” handler that returns a tuple (result, next node). Must be an async function with one of these signatures: Copy Ask AI # New style (recommended) async def handle_transition ( args : Dict[ str , Any], result : FlowResult, flow_manager : FlowManager ) -> None : """Handle transition to next node.""" if result.available: # Type-safe access to result await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Legacy style (supported for backwards compatibility) async def handle_transition ( args : Dict[ str , Any], flow_manager : FlowManager ) -> None : """Handle transition to next node.""" await flow_manager.set_node_from_config(create_next_node()) The callback receives: args : Arguments from the function call result : Typed result from the function handler (new style only) flow_manager : Reference to the FlowManager instance Example usage: Copy Ask AI async def handle_availability_check ( args : Dict, result : TimeResult, # Typed result flow_manager : FlowManager ): """Handle availability check and transition based on result.""" if result.available: await flow_manager.set_node_from_config(create_confirmation_node()) else : await flow_manager.set_node_from_config( create_no_availability_node(result.alternative_times) ) # Use in function configuration { "type" : "function" , "function" : { "name" : "check_availability" , "handler" : check_availability, "parameters" : { ... }, "transition_callback" : handle_availability_check } } Note: A function cannot have both transition_to and transition_callback . ​ Handler Signatures Function handlers passed as a handler in a function configuration can be defined with three different signatures: Modern (Args + FlowManager) Legacy (Args Only) No Arguments Copy Ask AI async def handler_with_flow_manager ( args : FlowArgs, flow_manager : FlowManager) -> tuple[FlowResult, NodeConfig]: """Modern handler that receives both arguments and FlowManager access.""" # Access state previous_data = flow_manager.state.get( "stored_data" ) # Access pipeline resources await flow_manager.task.queue_frame(TTSSpeakFrame( "Processing your request..." )) # Store data in state for later flow_manager.state[ "new_data" ] = args[ "input" ] return { "status" : "success" , "result" : "Processed with flow access" }, create_next_node() The framework automatically detects which signature your handler is using and calls it appropriately. If you’re passing your function directly into your NodeConfig rather than as a handler in a function configuration, you’d use a somewhat different signature: Direct Copy Ask AI async def do_something ( flow_manager : FlowManager, foo : int , bar : str = "" ) -> tuple[FlowResult, NodeConfig]: """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. Defaults to empty string. """ result = await fetch_data(foo, bar) next_node = create_end_node() return result, next_node ​ Return Types Success Response Error Response Copy Ask AI { "status" : "success" , "data" : "some data" # Optional additional data } ​ Provider-Specific Formats You don’t need to handle these format differences manually - use the standard format and the FlowManager will adapt it for your chosen provider. OpenAI Anthropic Google (Gemini) Copy Ask AI { "type" : "function" , "function" : { "name" : "function_name" , "handler" : handler, "description" : "Description" , "parameters" : { ... } } } ​ Actions pre_actions and post_actions are used to manage conversation flow. They are included in the NodeConfig and executed before and after the LLM completion, respectively. Three kinds of actions are available: Pre-canned actions: These actions perform common tasks and require little configuration. Function actions: These actions run developer-defined functions at the appropriate time. Custom actions: These are fully developer-defined actions, providing flexibility at the expense of complexity. ​ Pre-canned Actions Common actions shipped with Flows for managing conversation flow. To use them, just add them to your NodeConfig . ​ tts_say action Speaks text immediately using the TTS service. Copy Ask AI Copy Ask AI "pre_actions" : [ { "type" : "tts_say" , "text" : "Processing your request..." # Required } ] ​ end_conversation action Ends the conversation and closes the connection. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "end_conversation" , "text" : "Goodbye!" # Optional farewell message } ] ​ Function Actions Actions that run developer-defined functions at the appropriate time. For example, if used in post_actions , they’ll run after the bot has finished talking and after any previous post_actions have finished. ​ function action Calls the developer-defined function at the appropriate time. Copy Ask AI Copy Ask AI "post_actions" : [ { "type" : "function" , "handler" : bot_turn_ended # Required } ] ​ Custom Actions Fully developer-defined actions, providing flexibility at the expense of complexity. Here’s the complexity: because these actions aren’t queued in the Pipecat pipeline, they may execute seemingly early if used in post_actions ; they’ll run immediately after the LLM completion is triggered but won’t wait around for the bot to finish talking. Why would you want this behavior? You might be writing an action that: Itself just queues another Frame into the Pipecat pipeline (meaning there would no benefit to waiting around for sequencing purposes) Does work that can be done a bit sooner, like logging that the LLM was updated Custom actions are composed of at least: ​ type str required String identifier for the action ​ handler Callable required Async or sync function that handles the action Example: Copy Ask AI Copy Ask AI # Define custom action handler async def custom_notification ( action : dict , flow_manager : FlowManager): """Custom action handler.""" message = action.get( "message" , "" ) await notify_user(message) # Use in node configuration "pre_actions" : [ { "type" : "notify" , "handler" : send_notification, "message" : "Attention!" , } ] ​ Exceptions ​ FlowError exception Base exception for all flow-related errors. Copy Ask AI Copy Ask AI from pipecat_flows import FlowError try : await flow_manager.set_node_from_config(config) except FlowError as e: print ( f "Flow error: { e } " ) ​ FlowInitializationError exception Raised when flow initialization fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowInitializationError try : await flow_manager.initialize() except FlowInitializationError as e: print ( f "Initialization failed: { e } " ) ​ FlowTransitionError exception Raised when a state transition fails. Copy Ask AI Copy Ask AI from pipecat_flows import FlowTransitionError try : await flow_manager.set_node_from_config(node_config) except FlowTransitionError as e: print ( f "Transition failed: { e } " ) ​ InvalidFunctionError exception Raised when an invalid or unavailable function is specified. Copy Ask AI Copy Ask AI from pipecat_flows import InvalidFunctionError try : await flow_manager.set_node_from_config({ "functions" : [{ "type" : "function" , "function" : { "name" : "invalid_function" } }] }) except InvalidFunctionError as e: print ( f "Invalid function: { e } " ) RTVI Observer PipelineParams On this page Installation Core Types FlowArgs FlowResult FlowConfig NodeConfig Function Handler Types ContextStrategy ContextStrategyConfig FlowsFunctionSchema FlowManager Methods State Management Usage Examples Function Properties Handler Signatures Return Types Provider-Specific Formats Actions Pre-canned Actions Function Actions Custom Actions Exceptions Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/fundamentals_custom-frame-processor_5b6df55e.txt b/fundamentals_custom-frame-processor_5b6df55e.txt
new file mode 100644
index 0000000000000000000000000000000000000000..def18070658cc694bd73b0ea580bbd55f581d8ed
--- /dev/null
+++ b/fundamentals_custom-frame-processor_5b6df55e.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/fundamentals/custom-frame-processor#essential-implementation-details
+Title: Custom FrameProcessor - Pipecat
+==================================================
+
+Custom FrameProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Custom FrameProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal Pipecat’s architecture is made up of a Pipeline, FrameProcessors, and Frames. See the Core Concepts for a full review. From that architecture, recall that FrameProcessors are the workers in the pipeline that receive frames and complete actions based on the frames received. Pipecat comes with many FrameProcessors built in. These consist of services, like OpenAILLMService or CartesiaTTSService , utilities, like UserIdleProcessor , and other things. Largely, you can build most of your application with these built-in FrameProcessors, but commonly, your application code may require custom frame processing logic. For example, you may want to perform an action as a result of a frame that’s pushed in the pipeline. ​ Example: ImageSyncAggregator Let’s look at an example custom FrameProcessor that synchronizes images with bot speech: Copy Ask AI class ImageSyncAggregator ( FrameProcessor ): def __init__ ( self , speaking_path : str , waiting_path : str ): super (). __init__ () self ._speaking_image = Image.open(speaking_path) self ._speaking_image_format = self ._speaking_image.format self ._speaking_image_bytes = self ._speaking_image.tobytes() self ._waiting_image = Image.open(waiting_path) self ._waiting_image_format = self ._waiting_image.format self ._waiting_image_bytes = self ._waiting_image.tobytes() async def process_frame ( self , frame : Frame, direction : FrameDirection): await super ().process_frame(frame, direction) if isinstance (frame, BotStartedSpeakingFrame): await self .push_frame( OutputImageRawFrame( image = self ._speaking_image_bytes, size = ( 1024 , 1024 ), format = self ._speaking_image_format, ) ) elif isinstance (frame, BotStoppedSpeakingFrame): await self .push_frame( OutputImageRawFrame( image = self ._waiting_image_bytes, size = ( 1024 , 1024 ), format = self ._waiting_image_format, ) ) await self .push_frame(frame) This example custom FrameProcessor looks for BotStartedSpeakingFrame and BotStoppedSpeakingFrame . When it sees a BotStartedSpeakingFrame , it will show an image that says the bot is speaking. When it sees a BotStoppedSpeakingFrame , it will show an image that says the bot is not speaking. See this working example using the ImageSyncAggregator FrameProcessor ​ Adding to a Pipeline This custom FrameProcessor can be added to a Pipeline just before the transport output: Copy Ask AI # Create and initialize the custom FrameProcessor image_sync_aggregator = ImageSyncAggregator( os.path.join(os.path.dirname( __file__ ), "assets" , "speaking.png" ), os.path.join(os.path.dirname( __file__ ), "assets" , "waiting.png" ), ) pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm, tts, image_sync_aggregator, # Our custom FrameProcessor transport.output(), context_aggregator.assistant(), ] ) With this positioning, the ImageSyncAggregator FrameProcessor will receive the BotStartedSpeakingFrame and BotStoppedSpeakingFrame outputted by the TTS processor and then push its own frame— OutputImageRawFrame —to the output transport. ​ Key Requirements FrameProcessors must inherit from the base FrameProcessor class. This ensures that your custom FrameProcessor will correctly handle frames like StartFrame , EndFrame , StartInterruptionFrame without having to write custom logic for those frames. This inheritance also provides it with the ability to process_frame() and push_frame() : process_frame() is what allows the FrameProcessor to receive frames and add custom conditional logic based on the frames that are received. push_frame() allows the FrameProcessor to push frames to the pipeline. Normally, frames are pushed DOWNSTREAM, but based on which processors need the output, you can also push UPSTREAM or in both directions. ​ Essential Implementation Details To ensure proper base class inheritance, it’s critical to include: super().__init__() in your __init__ method await super().process_frame(frame, direction) in your process_frame() method Copy Ask AI class MyCustomProcessor ( FrameProcessor ): def __init__ ( self , ** kwargs ): super (). __init__ ( ** kwargs) # ✅ Required # Your initialization code here async def process_frame ( self , frame : Frame, direction : FrameDirection): await super ().process_frame(frame, direction) # ✅ Required # Your custom frame processing logic here if isinstance (frame, SomeSpecificFrame): # Handle the frame pass await self .push_frame(frame) # ✅ Required - pass frame through ​ Critical Responsibility: Frame Forwarding FrameProcessors receive all frames that are pushed through the pipeline. This gives them a lot of power, but also a great responsibility. Critically, they must push all frames through the pipeline; if they don’t, they block frames from moving through the Pipeline, which will cause issues in how your application functions. You can see this at work in the ImageSyncAggregator ’s process_frame() method. It handles both bot speaking frames and also has an await self.push_frame(frame) which pushes the frame through to the next processor in the pipeline. ​ Frame Direction When pushing frames, you can specify the direction: Copy Ask AI # Push downstream (default) await self .push_frame(frame) await self .push_frame(frame, FrameDirection. DOWNSTREAM ) # Push upstream await self .push_frame(frame, FrameDirection. UPSTREAM ) Most custom FrameProcessors will push frames downstream, but upstream can be useful for sending control frames or error notifications back up the pipeline. ​ Best Practices Always call the parent methods : Use super().__init__() and await super().process_frame() Forward all frames : Make sure every frame is pushed through with await self.push_frame(frame) Handle frames conditionally : Use isinstance() checks to handle specific frame types Use proper error handling : Wrap risky operations in try/catch blocks Position carefully in pipeline : Consider where in the pipeline your processor needs to be to receive the right frames With these patterns, you can create powerful custom FrameProcessors that extend Pipecat’s capabilities for your specific use case. Context Management Detecting Idle Users On this page Example: ImageSyncAggregator Adding to a Pipeline Key Requirements Essential Implementation Details Critical Responsibility: Frame Forwarding Frame Direction Best Practices Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/fundamentals_end-pipeline_4c8f1140.txt b/fundamentals_end-pipeline_4c8f1140.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c3f94496850c7d60de9d61f042ef77eedc0c62bd
--- /dev/null
+++ b/fundamentals_end-pipeline_4c8f1140.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/fundamentals/end-pipeline#graceful-shutdown
+Title: Ending a Pipeline - Pipecat
+==================================================
+
+Ending a Pipeline - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Ending a Pipeline Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Overview Properly ending a Pipecat pipeline is essential to prevent hanging processes and ensure clean shutdown of your session and related infrastructure. This guide covers different approaches to pipeline termination and provides best practices for each scenario. ​ Shutdown Approaches Pipecat provides two primary methods for shutting down a pipeline: Graceful Shutdown : Allows completion of in-progress processing before termination Immediate Shutdown : Cancels all tasks immediately Each approach is designed for different use cases, as detailed below. ​ Graceful Shutdown A graceful shutdown is ideal when you want the bot to properly end a conversation. For example, you might want to terminate a session after the bot has completed a specific task or reached a natural conclusion. This approach ensures that any final messages from the bot are processed and delivered before the pipeline terminates. ​ Implementation To implement a graceful shutdown, there are two options: Push an EndFrame from outside your pipeline using the pipeline task: Copy Ask AI await task.queue_frame(EndFrame()) Push an EndTaskFrame upstream from inside your pipeline. For example, inside a function call: Copy Ask AI async def end_conversation ( params : FunctionCallParams): await params.llm.push_frame(TTSSpeakFrame( "Have a nice day!" )) # Signal that the task should end after processing this frame await params.llm.push_frame(EndTaskFrame(), FrameDirection. UPSTREAM ) ​ How Graceful Shutdown Works In both cases, an EndFrame is pushed downstream from the beginning of the pipeline: EndFrame s are queued, so they’ll process after any pending frames (like goodbye messages) All processors in the pipeline will shutdown when processing the EndFrame Once the EndFrame reaches the sink of the PipelineTask , the Pipeline is ready to shut down The Pipecat processor terminates and related resources are released Graceful shutdowns allow your bot to say goodbye and complete any final actions before terminating. ​ Immediate Shutdown An immediate shutdown is appropriate when the human participant is no longer active in the conversation. For example: In a client/server app, when the user closes the browser tab or ends the session In a phone call, when the user hangs up When an error occurs that requires immediate termination In these scenarios, there’s no value in having the bot complete its current turn. ​ Implementation To implement an immediate shutdown, you can use event handlers to, for example, detect disconnections and then push a CancelFrame : Copy Ask AI @transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , client ): logger.info( f "Client closed connection" ) await task.cancel() ​ How Immediate Shutdown Works An event triggers the cancellation (like a client disconnection) task.cancel() is called, which pushes a CancelFrame downstream from the PipelineTask CancelFrame s are SystemFrame s and are not queued Processors that handle the CancelFrame immediate shutdown and push the frame downstream Once the CancelFrame reaches the sink of the PipelineTask , the Pipeline is ready to shut down Immediate shutdowns will discard any pending frames in the pipeline. Use this approach when completing the conversation is no longer necessary. ​ Pipeline Idle Detection In addition to the two explicit shutdown mechanisms, Pipecat includes a backup mechanism to prevent hanging pipelines—Pipeline Idle Detection. This feature monitors activity in your pipeline and can automatically cancel tasks when no meaningful bot interactions are occurring for an extended period. It serves as a safety net to conditionally terminate the pipeline if anomalous behavior occurs. Pipeline Idle Detection is enabled by default and helps prevent resources from being wasted on inactive conversations. For more information on configuring and customizing this feature, see the Pipeline Idle Detection documentation. ​ Best Practices Use graceful shutdowns when you want to let the bot complete its conversation Use immediate shutdowns when the human participant has already disconnected Implement error handling to ensure pipelines can terminate even when exceptions occur Configure idle detection timeouts appropriate for your use case By following these practices, you’ll ensure that your Pipecat pipelines terminate properly and efficiently, preventing resource leaks and improving overall system reliability. Detecting Idle Users Function Calling On this page Overview Shutdown Approaches Graceful Shutdown Implementation How Graceful Shutdown Works Immediate Shutdown Implementation How Immediate Shutdown Works Pipeline Idle Detection Best Practices Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/fundamentals_function-calling_3a32b0d2.txt b/fundamentals_function-calling_3a32b0d2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..112ecc2eb2408a7f26b28f068edfaa11ec2f16bd
--- /dev/null
+++ b/fundamentals_function-calling_3a32b0d2.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/fundamentals/function-calling#implementation
+Title: Function Calling - Pipecat
+==================================================
+
+Function Calling - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Function Calling Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Understanding Function Calling Function calling (also known as tool calling) allows LLMs to request information from external services and APIs. This enables your bot to access real-time data and perform actions that aren’t part of its training data. For example, you could give your bot the ability to: Check current weather conditions Look up stock prices Query a database Control smart home devices Schedule appointments Here’s how it works: You define functions the LLM can use and register them to the LLM service used in your pipeline When needed, the LLM requests a function call Your application executes any corresponding functions The result is sent back to the LLM The LLM uses this information in its response ​ Implementation ​ 1. Define Functions Pipecat provides a standardized FunctionSchema that works across all supported LLM providers. This makes it easy to define functions once and use them with any provider. As a shorthand, you could also bypass specifying a function configuration at all and instead use “direct” functions. Under the hood, these are converted to FunctionSchema s. ​ Using the Standard Schema (Recommended) Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Define a function using the standard schema weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather in a location" , properties = { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, required = [ "location" , "format" ] ) # Create a tools schema with your functions tools = ToolsSchema( standard_tools = [weather_function]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) The ToolsSchema will be automatically converted to the correct format for your LLM provider through adapters. ​ Using Direct Functions (Shorthand) You can bypass specifying a function configuration (as a FunctionSchema or in a provider-specific format) and instead pass the function directly to your ToolsSchema . Pipecat will auto-configure the function, gathering relevant metadata from its signature and docstring. Metadata includes: name description properties (including individual property descriptions) list of required properties Note that the function signature is a bit different when using direct functions. The first parameter is FunctionCallParams , followed by any others necessary for the function. Copy Ask AI from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Define a direct function async def get_current_weather ( params : FunctionCallParams, location : str , format : str ): """Get the current weather. Args: location: The city and state, e.g. "San Francisco, CA". format: The temperature unit to use. Must be either "celsius" or "fahrenheit". """ weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Create a tools schema, passing your function directly to it tools = ToolsSchema( standard_tools = [get_current_weather]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) ​ Using Provider-Specific Formats (Alternative) You can also define functions in the provider-specific format if needed: OpenAI Anthropic Gemini Copy Ask AI from openai.types.chat import ChatCompletionToolParam # OpenAI native format tools = [ ChatCompletionToolParam( type = "function" , function = { "name" : "get_current_weather" , "description" : "Get the current weather" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, "required" : [ "location" , "format" ], }, }, ) ] ​ Provider-Specific Custom Tools Some providers support unique tools that don’t fit the standard function schema. For these cases, you can add custom tools: Copy Ask AI from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema # Standard functions weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather" , properties = { "location" : { "type" : "string" }}, required = [ "location" ] ) # Custom Gemini search tool gemini_search_tool = { "web_search" : { "description" : "Search the web for information" } } # Create a tools schema with both standard and custom tools tools = ToolsSchema( standard_tools = [weather_function], custom_tools = { AdapterType. GEMINI : [gemini_search_tool] } ) See the provider-specific documentation for details on custom tools and their formats. ​ 2. Register Function Handlers Register handlers for your functions using one of these LLM service methods : register_function register_direct_function Which one you use depends on whether your function is a “direct” function . Non-Direct Function Direct Function Copy Ask AI from pipecat.services.llm_service import FunctionCallParams llm = OpenAILLMService( api_key = "your-api-key" ) # Main function handler - called to execute the function async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data from your API weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Register the function llm.register_function( "get_current_weather" , fetch_weather_from_api, ) ​ 3. Create the Pipeline Include your LLM service in your pipeline with the registered functions: Copy Ask AI # Initialize the LLM context with your function schemas context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) # Create the context aggregator to collect the user and assistant context context_aggregator = llm.create_context_aggregator(context) # Create the pipeline pipeline = Pipeline([ transport.input(), # Input from the transport stt, # STT processing context_aggregator.user(), # User context aggregation llm, # LLM processing tts, # TTS processing transport.output(), # Output to the transport context_aggregator.assistant(), # Assistant context aggregation ]) ​ Function Handler Details ​ FunctionCallParams The FunctionCallParams object contains all the information needed for handling function calls: params : FunctionCallParams function_name : Name of the called function arguments : Arguments passed by the LLM tool_call_id : Unique identifier for the function call llm : Reference to the LLM service context : Current conversation context result_callback : Async function to return results ​ function_name str Name of the function being called ​ tool_call_id str Unique identifier for the function call ​ arguments Mapping[str, Any] Arguments passed by the LLM to the function ​ llm LLMService Reference to the LLM service that initiated the call ​ context OpenAILLMContext Current conversation context ​ result_callback FunctionCallResultCallback Async callback function to return results ​ Handler Structure Your function handler should: Receive necessary arguments, either: From params.arguments Directly From function arguments, if using direct functions Process data or call external services Return results via params.result_callback(result) Non-Direct Function Direct Function Copy Ask AI async def fetch_weather_from_api ( params : FunctionCallParams): try : # Extract arguments location = params.arguments.get( "location" ) format_type = params.arguments.get( "format" , "celsius" ) # Call external API api_result = await weather_api.get_weather(location, format_type) # Return formatted result await params.result_callback({ "location" : location, "temperature" : api_result[ "temp" ], "conditions" : api_result[ "conditions" ], "unit" : format_type }) except Exception as e: # Handle errors await params.result_callback({ "error" : f "Failed to get weather: { str (e) } " }) ​ Controlling Function Call Behavior (Advanced) When returning results from a function handler, you can control how the LLM processes those results using a FunctionCallResultProperties object passed to the result callback. It can be handy to skip a completion when you have back-to-back function calls. Note, if you skip a completion, you must manually trigger one from the context. ​ Properties ​ run_llm Optional[bool] Controls whether the LLM should generate a response after the function call: True : Run LLM after function call (default if no other function calls in progress) False : Don’t run LLM after function call None : Use default behavior ​ on_context_updated Optional[Callable[[], Awaitable[None]]] Optional callback that runs after the function result is added to the context ​ Example Usage Copy Ask AI from pipecat.frames.frames import FunctionCallResultProperties from pipecat.services.llm_service import FunctionCallParams async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data weather_data = { "conditions" : "sunny" , "temperature" : "75" } # Don't run LLM after this function call properties = FunctionCallResultProperties( run_llm = False ) await params.result_callback(weather_data, properties = properties) async def query_database ( params : FunctionCallParams): # Query database results = await db.query(params.arguments[ "query" ]) async def on_update (): await notify_system( "Database query complete" ) # Run LLM after function call and notify when context is updated properties = FunctionCallResultProperties( run_llm = True , on_context_updated = on_update ) await params.result_callback(results, properties = properties) ​ Next steps Check out the function calling examples to see a complete example for specific LLM providers. Refer to your LLM provider’s documentation to learn more about their function calling capabilities. Ending a Pipeline Muting User Input On this page Understanding Function Calling Implementation 1. Define Functions Using the Standard Schema (Recommended) Using Direct Functions (Shorthand) Using Provider-Specific Formats (Alternative) Provider-Specific Custom Tools 2. Register Function Handlers 3. Create the Pipeline Function Handler Details FunctionCallParams Handler Structure Controlling Function Call Behavior (Advanced) Properties Example Usage Next steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/fundamentals_function-calling_dc5dcc4c.txt b/fundamentals_function-calling_dc5dcc4c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..37f8d2769ab118d122be469bb97a0e7f5ae9c746
--- /dev/null
+++ b/fundamentals_function-calling_dc5dcc4c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/fundamentals/function-calling#param-llm
+Title: Function Calling - Pipecat
+==================================================
+
+Function Calling - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Function Calling Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Understanding Function Calling Function calling (also known as tool calling) allows LLMs to request information from external services and APIs. This enables your bot to access real-time data and perform actions that aren’t part of its training data. For example, you could give your bot the ability to: Check current weather conditions Look up stock prices Query a database Control smart home devices Schedule appointments Here’s how it works: You define functions the LLM can use and register them to the LLM service used in your pipeline When needed, the LLM requests a function call Your application executes any corresponding functions The result is sent back to the LLM The LLM uses this information in its response ​ Implementation ​ 1. Define Functions Pipecat provides a standardized FunctionSchema that works across all supported LLM providers. This makes it easy to define functions once and use them with any provider. As a shorthand, you could also bypass specifying a function configuration at all and instead use “direct” functions. Under the hood, these are converted to FunctionSchema s. ​ Using the Standard Schema (Recommended) Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Define a function using the standard schema weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather in a location" , properties = { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, required = [ "location" , "format" ] ) # Create a tools schema with your functions tools = ToolsSchema( standard_tools = [weather_function]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) The ToolsSchema will be automatically converted to the correct format for your LLM provider through adapters. ​ Using Direct Functions (Shorthand) You can bypass specifying a function configuration (as a FunctionSchema or in a provider-specific format) and instead pass the function directly to your ToolsSchema . Pipecat will auto-configure the function, gathering relevant metadata from its signature and docstring. Metadata includes: name description properties (including individual property descriptions) list of required properties Note that the function signature is a bit different when using direct functions. The first parameter is FunctionCallParams , followed by any others necessary for the function. Copy Ask AI from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Define a direct function async def get_current_weather ( params : FunctionCallParams, location : str , format : str ): """Get the current weather. Args: location: The city and state, e.g. "San Francisco, CA". format: The temperature unit to use. Must be either "celsius" or "fahrenheit". """ weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Create a tools schema, passing your function directly to it tools = ToolsSchema( standard_tools = [get_current_weather]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) ​ Using Provider-Specific Formats (Alternative) You can also define functions in the provider-specific format if needed: OpenAI Anthropic Gemini Copy Ask AI from openai.types.chat import ChatCompletionToolParam # OpenAI native format tools = [ ChatCompletionToolParam( type = "function" , function = { "name" : "get_current_weather" , "description" : "Get the current weather" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, "required" : [ "location" , "format" ], }, }, ) ] ​ Provider-Specific Custom Tools Some providers support unique tools that don’t fit the standard function schema. For these cases, you can add custom tools: Copy Ask AI from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema # Standard functions weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather" , properties = { "location" : { "type" : "string" }}, required = [ "location" ] ) # Custom Gemini search tool gemini_search_tool = { "web_search" : { "description" : "Search the web for information" } } # Create a tools schema with both standard and custom tools tools = ToolsSchema( standard_tools = [weather_function], custom_tools = { AdapterType. GEMINI : [gemini_search_tool] } ) See the provider-specific documentation for details on custom tools and their formats. ​ 2. Register Function Handlers Register handlers for your functions using one of these LLM service methods : register_function register_direct_function Which one you use depends on whether your function is a “direct” function . Non-Direct Function Direct Function Copy Ask AI from pipecat.services.llm_service import FunctionCallParams llm = OpenAILLMService( api_key = "your-api-key" ) # Main function handler - called to execute the function async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data from your API weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Register the function llm.register_function( "get_current_weather" , fetch_weather_from_api, ) ​ 3. Create the Pipeline Include your LLM service in your pipeline with the registered functions: Copy Ask AI # Initialize the LLM context with your function schemas context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) # Create the context aggregator to collect the user and assistant context context_aggregator = llm.create_context_aggregator(context) # Create the pipeline pipeline = Pipeline([ transport.input(), # Input from the transport stt, # STT processing context_aggregator.user(), # User context aggregation llm, # LLM processing tts, # TTS processing transport.output(), # Output to the transport context_aggregator.assistant(), # Assistant context aggregation ]) ​ Function Handler Details ​ FunctionCallParams The FunctionCallParams object contains all the information needed for handling function calls: params : FunctionCallParams function_name : Name of the called function arguments : Arguments passed by the LLM tool_call_id : Unique identifier for the function call llm : Reference to the LLM service context : Current conversation context result_callback : Async function to return results ​ function_name str Name of the function being called ​ tool_call_id str Unique identifier for the function call ​ arguments Mapping[str, Any] Arguments passed by the LLM to the function ​ llm LLMService Reference to the LLM service that initiated the call ​ context OpenAILLMContext Current conversation context ​ result_callback FunctionCallResultCallback Async callback function to return results ​ Handler Structure Your function handler should: Receive necessary arguments, either: From params.arguments Directly From function arguments, if using direct functions Process data or call external services Return results via params.result_callback(result) Non-Direct Function Direct Function Copy Ask AI async def fetch_weather_from_api ( params : FunctionCallParams): try : # Extract arguments location = params.arguments.get( "location" ) format_type = params.arguments.get( "format" , "celsius" ) # Call external API api_result = await weather_api.get_weather(location, format_type) # Return formatted result await params.result_callback({ "location" : location, "temperature" : api_result[ "temp" ], "conditions" : api_result[ "conditions" ], "unit" : format_type }) except Exception as e: # Handle errors await params.result_callback({ "error" : f "Failed to get weather: { str (e) } " }) ​ Controlling Function Call Behavior (Advanced) When returning results from a function handler, you can control how the LLM processes those results using a FunctionCallResultProperties object passed to the result callback. It can be handy to skip a completion when you have back-to-back function calls. Note, if you skip a completion, you must manually trigger one from the context. ​ Properties ​ run_llm Optional[bool] Controls whether the LLM should generate a response after the function call: True : Run LLM after function call (default if no other function calls in progress) False : Don’t run LLM after function call None : Use default behavior ​ on_context_updated Optional[Callable[[], Awaitable[None]]] Optional callback that runs after the function result is added to the context ​ Example Usage Copy Ask AI from pipecat.frames.frames import FunctionCallResultProperties from pipecat.services.llm_service import FunctionCallParams async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data weather_data = { "conditions" : "sunny" , "temperature" : "75" } # Don't run LLM after this function call properties = FunctionCallResultProperties( run_llm = False ) await params.result_callback(weather_data, properties = properties) async def query_database ( params : FunctionCallParams): # Query database results = await db.query(params.arguments[ "query" ]) async def on_update (): await notify_system( "Database query complete" ) # Run LLM after function call and notify when context is updated properties = FunctionCallResultProperties( run_llm = True , on_context_updated = on_update ) await params.result_callback(results, properties = properties) ​ Next steps Check out the function calling examples to see a complete example for specific LLM providers. Refer to your LLM provider’s documentation to learn more about their function calling capabilities. Ending a Pipeline Muting User Input On this page Understanding Function Calling Implementation 1. Define Functions Using the Standard Schema (Recommended) Using Direct Functions (Shorthand) Using Provider-Specific Formats (Alternative) Provider-Specific Custom Tools 2. Register Function Handlers 3. Create the Pipeline Function Handler Details FunctionCallParams Handler Structure Controlling Function Call Behavior (Advanced) Properties Example Usage Next steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/fundamentals_function-calling_ec7813c8.txt b/fundamentals_function-calling_ec7813c8.txt
new file mode 100644
index 0000000000000000000000000000000000000000..808fcc3bec9126807a0ee97d8cfc92803eeb985e
--- /dev/null
+++ b/fundamentals_function-calling_ec7813c8.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/fundamentals/function-calling#handler-structure
+Title: Function Calling - Pipecat
+==================================================
+
+Function Calling - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals Function Calling Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Understanding Function Calling Function calling (also known as tool calling) allows LLMs to request information from external services and APIs. This enables your bot to access real-time data and perform actions that aren’t part of its training data. For example, you could give your bot the ability to: Check current weather conditions Look up stock prices Query a database Control smart home devices Schedule appointments Here’s how it works: You define functions the LLM can use and register them to the LLM service used in your pipeline When needed, the LLM requests a function call Your application executes any corresponding functions The result is sent back to the LLM The LLM uses this information in its response ​ Implementation ​ 1. Define Functions Pipecat provides a standardized FunctionSchema that works across all supported LLM providers. This makes it easy to define functions once and use them with any provider. As a shorthand, you could also bypass specifying a function configuration at all and instead use “direct” functions. Under the hood, these are converted to FunctionSchema s. ​ Using the Standard Schema (Recommended) Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Define a function using the standard schema weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather in a location" , properties = { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, required = [ "location" , "format" ] ) # Create a tools schema with your functions tools = ToolsSchema( standard_tools = [weather_function]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) The ToolsSchema will be automatically converted to the correct format for your LLM provider through adapters. ​ Using Direct Functions (Shorthand) You can bypass specifying a function configuration (as a FunctionSchema or in a provider-specific format) and instead pass the function directly to your ToolsSchema . Pipecat will auto-configure the function, gathering relevant metadata from its signature and docstring. Metadata includes: name description properties (including individual property descriptions) list of required properties Note that the function signature is a bit different when using direct functions. The first parameter is FunctionCallParams , followed by any others necessary for the function. Copy Ask AI from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Define a direct function async def get_current_weather ( params : FunctionCallParams, location : str , format : str ): """Get the current weather. Args: location: The city and state, e.g. "San Francisco, CA". format: The temperature unit to use. Must be either "celsius" or "fahrenheit". """ weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Create a tools schema, passing your function directly to it tools = ToolsSchema( standard_tools = [get_current_weather]) # Pass this to your LLM context context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) ​ Using Provider-Specific Formats (Alternative) You can also define functions in the provider-specific format if needed: OpenAI Anthropic Gemini Copy Ask AI from openai.types.chat import ChatCompletionToolParam # OpenAI native format tools = [ ChatCompletionToolParam( type = "function" , function = { "name" : "get_current_weather" , "description" : "Get the current weather" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. San Francisco, CA" , }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "The temperature unit to use." , }, }, "required" : [ "location" , "format" ], }, }, ) ] ​ Provider-Specific Custom Tools Some providers support unique tools that don’t fit the standard function schema. For these cases, you can add custom tools: Copy Ask AI from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema # Standard functions weather_function = FunctionSchema( name = "get_current_weather" , description = "Get the current weather" , properties = { "location" : { "type" : "string" }}, required = [ "location" ] ) # Custom Gemini search tool gemini_search_tool = { "web_search" : { "description" : "Search the web for information" } } # Create a tools schema with both standard and custom tools tools = ToolsSchema( standard_tools = [weather_function], custom_tools = { AdapterType. GEMINI : [gemini_search_tool] } ) See the provider-specific documentation for details on custom tools and their formats. ​ 2. Register Function Handlers Register handlers for your functions using one of these LLM service methods : register_function register_direct_function Which one you use depends on whether your function is a “direct” function . Non-Direct Function Direct Function Copy Ask AI from pipecat.services.llm_service import FunctionCallParams llm = OpenAILLMService( api_key = "your-api-key" ) # Main function handler - called to execute the function async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data from your API weather_data = { "conditions" : "sunny" , "temperature" : "75" } await params.result_callback(weather_data) # Register the function llm.register_function( "get_current_weather" , fetch_weather_from_api, ) ​ 3. Create the Pipeline Include your LLM service in your pipeline with the registered functions: Copy Ask AI # Initialize the LLM context with your function schemas context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }], tools = tools ) # Create the context aggregator to collect the user and assistant context context_aggregator = llm.create_context_aggregator(context) # Create the pipeline pipeline = Pipeline([ transport.input(), # Input from the transport stt, # STT processing context_aggregator.user(), # User context aggregation llm, # LLM processing tts, # TTS processing transport.output(), # Output to the transport context_aggregator.assistant(), # Assistant context aggregation ]) ​ Function Handler Details ​ FunctionCallParams The FunctionCallParams object contains all the information needed for handling function calls: params : FunctionCallParams function_name : Name of the called function arguments : Arguments passed by the LLM tool_call_id : Unique identifier for the function call llm : Reference to the LLM service context : Current conversation context result_callback : Async function to return results ​ function_name str Name of the function being called ​ tool_call_id str Unique identifier for the function call ​ arguments Mapping[str, Any] Arguments passed by the LLM to the function ​ llm LLMService Reference to the LLM service that initiated the call ​ context OpenAILLMContext Current conversation context ​ result_callback FunctionCallResultCallback Async callback function to return results ​ Handler Structure Your function handler should: Receive necessary arguments, either: From params.arguments Directly From function arguments, if using direct functions Process data or call external services Return results via params.result_callback(result) Non-Direct Function Direct Function Copy Ask AI async def fetch_weather_from_api ( params : FunctionCallParams): try : # Extract arguments location = params.arguments.get( "location" ) format_type = params.arguments.get( "format" , "celsius" ) # Call external API api_result = await weather_api.get_weather(location, format_type) # Return formatted result await params.result_callback({ "location" : location, "temperature" : api_result[ "temp" ], "conditions" : api_result[ "conditions" ], "unit" : format_type }) except Exception as e: # Handle errors await params.result_callback({ "error" : f "Failed to get weather: { str (e) } " }) ​ Controlling Function Call Behavior (Advanced) When returning results from a function handler, you can control how the LLM processes those results using a FunctionCallResultProperties object passed to the result callback. It can be handy to skip a completion when you have back-to-back function calls. Note, if you skip a completion, you must manually trigger one from the context. ​ Properties ​ run_llm Optional[bool] Controls whether the LLM should generate a response after the function call: True : Run LLM after function call (default if no other function calls in progress) False : Don’t run LLM after function call None : Use default behavior ​ on_context_updated Optional[Callable[[], Awaitable[None]]] Optional callback that runs after the function result is added to the context ​ Example Usage Copy Ask AI from pipecat.frames.frames import FunctionCallResultProperties from pipecat.services.llm_service import FunctionCallParams async def fetch_weather_from_api ( params : FunctionCallParams): # Fetch weather data weather_data = { "conditions" : "sunny" , "temperature" : "75" } # Don't run LLM after this function call properties = FunctionCallResultProperties( run_llm = False ) await params.result_callback(weather_data, properties = properties) async def query_database ( params : FunctionCallParams): # Query database results = await db.query(params.arguments[ "query" ]) async def on_update (): await notify_system( "Database query complete" ) # Run LLM after function call and notify when context is updated properties = FunctionCallResultProperties( run_llm = True , on_context_updated = on_update ) await params.result_callback(results, properties = properties) ​ Next steps Check out the function calling examples to see a complete example for specific LLM providers. Refer to your LLM provider’s documentation to learn more about their function calling capabilities. Ending a Pipeline Muting User Input On this page Understanding Function Calling Implementation 1. Define Functions Using the Standard Schema (Recommended) Using Direct Functions (Shorthand) Using Provider-Specific Formats (Alternative) Provider-Specific Custom Tools 2. Register Function Handlers 3. Create the Pipeline Function Handler Details FunctionCallParams Handler Structure Controlling Function Call Behavior (Advanced) Properties Example Usage Next steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/fundamentals_user-input-muting_5d87af43.txt b/fundamentals_user-input-muting_5d87af43.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7f0be778a982d8b6174cc6b143bdfc2c439f3df9
--- /dev/null
+++ b/fundamentals_user-input-muting_5d87af43.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/fundamentals/user-input-muting
+Title: User Input Muting with STTMuteFilter - Pipecat
+==================================================
+
+User Input Muting with STTMuteFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals User Input Muting with STTMuteFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Overview In conversational applications, there are moments when you don’t want to process user speech, such as during bot introductions or while executing function calls. Pipecat’s STTMuteFilter lets you selectively “mute” user input based on different conversation states. ​ When to Use STTMuteFilter Common scenarios for muting user input include: During introductions : Prevent the bot from being interrupted during its initial greeting While processing functions : Block input while the bot is retrieving external data During bot speech : Reduce false transcriptions while the bot is speaking For guided conversations : Create more structured interactions with clear turn-taking ​ How It Works The STTMuteFilter works by blocking specific user-related frames from flowing through your pipeline. When muted, it filters: Voice activity detection (VAD) events Interruption signals Raw audio input frames This prevents the Speech-to-Text service from receiving and processing the user’s speech during muted periods. The filter must be placed between your Transport and STT service in the pipeline to work correctly. ​ Mute Strategies The STTMuteFilter supports several strategies for determining when to mute user input: FIRST_SPEECH Mute only during the bot’s first speech utterance. Useful for introductions when you want the bot to complete its greeting before the user can speak. MUTE_UNTIL_FIRST_BOT_COMPLETE Start muted and remain muted until the first bot utterance completes. Ensures the bot’s initial instructions are fully delivered. FUNCTION_CALL Mute during function calls. Prevents users from speaking while the bot is processing external data requests. ALWAYS Mute whenever the bot is speaking. Creates a strict turn-taking conversation pattern. CUSTOM Use custom logic via callback to determine when to mute. Provides maximum flexibility for complex muting rules. The FIRST_SPEECH and MUTE_UNTIL_FIRST_BOT_COMPLETE strategies should not be used together as they handle the first bot speech differently. ​ Basic Implementation ​ Step 1: Configure the Filter First, create a configuration for the STTMuteFilter : Copy Ask AI from pipecat.processors.filters.stt_mute_filter import STTMuteConfig, STTMuteFilter, STTMuteStrategy # Configure with one or more strategies stt_mute_processor = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE , STTMuteStrategy. FUNCTION_CALL , } ), ) ​ Step 2: Add to Your Pipeline Place the filter between your transport input and STT service: Copy Ask AI pipeline = Pipeline( [ transport.input(), # Transport user input stt_mute_processor, # Add the mute processor before STT stt, # Speech-to-text service context_aggregator.user(), # User responses llm, # LLM tts, # Text-to-speech transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) ​ Best Practices Place the filter correctly : Always position STTMuteFilter between transport input and STT Choose strategies wisely : Select the minimal set of strategies needed for your use case Test user experience : Excessive muting can frustrate users; balance control with usability Consider feedback : Provide visual cues when the user is muted to improve the experience ​ Next Steps Try the STTMuteFilter Example Explore a complete working example that demonstrates how to use STTMuteFilter to control user input during bot speech and function calls. STTMuteFilter Reference Read the complete API reference documentation for advanced configuration options and muting strategies. Experiment with different muting strategies to find the right balance for your application. For advanced scenarios, try implementing custom muting logic based on specific conversation states or content. Function Calling Recording Audio On this page Overview When to Use STTMuteFilter How It Works Mute Strategies Basic Implementation Step 1: Configure the Filter Step 2: Add to Your Pipeline Best Practices Next Steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/fundamentals_user-input-muting_a89f5798.txt b/fundamentals_user-input-muting_a89f5798.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f8ab0250d7c43a1453f08994e36dbfab9629c657
--- /dev/null
+++ b/fundamentals_user-input-muting_a89f5798.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/fundamentals/user-input-muting#when-to-use-sttmutefilter
+Title: User Input Muting with STTMuteFilter - Pipecat
+==================================================
+
+User Input Muting with STTMuteFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Fundamentals User Input Muting with STTMuteFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Overview In conversational applications, there are moments when you don’t want to process user speech, such as during bot introductions or while executing function calls. Pipecat’s STTMuteFilter lets you selectively “mute” user input based on different conversation states. ​ When to Use STTMuteFilter Common scenarios for muting user input include: During introductions : Prevent the bot from being interrupted during its initial greeting While processing functions : Block input while the bot is retrieving external data During bot speech : Reduce false transcriptions while the bot is speaking For guided conversations : Create more structured interactions with clear turn-taking ​ How It Works The STTMuteFilter works by blocking specific user-related frames from flowing through your pipeline. When muted, it filters: Voice activity detection (VAD) events Interruption signals Raw audio input frames This prevents the Speech-to-Text service from receiving and processing the user’s speech during muted periods. The filter must be placed between your Transport and STT service in the pipeline to work correctly. ​ Mute Strategies The STTMuteFilter supports several strategies for determining when to mute user input: FIRST_SPEECH Mute only during the bot’s first speech utterance. Useful for introductions when you want the bot to complete its greeting before the user can speak. MUTE_UNTIL_FIRST_BOT_COMPLETE Start muted and remain muted until the first bot utterance completes. Ensures the bot’s initial instructions are fully delivered. FUNCTION_CALL Mute during function calls. Prevents users from speaking while the bot is processing external data requests. ALWAYS Mute whenever the bot is speaking. Creates a strict turn-taking conversation pattern. CUSTOM Use custom logic via callback to determine when to mute. Provides maximum flexibility for complex muting rules. The FIRST_SPEECH and MUTE_UNTIL_FIRST_BOT_COMPLETE strategies should not be used together as they handle the first bot speech differently. ​ Basic Implementation ​ Step 1: Configure the Filter First, create a configuration for the STTMuteFilter : Copy Ask AI from pipecat.processors.filters.stt_mute_filter import STTMuteConfig, STTMuteFilter, STTMuteStrategy # Configure with one or more strategies stt_mute_processor = STTMuteFilter( config = STTMuteConfig( strategies = { STTMuteStrategy. MUTE_UNTIL_FIRST_BOT_COMPLETE , STTMuteStrategy. FUNCTION_CALL , } ), ) ​ Step 2: Add to Your Pipeline Place the filter between your transport input and STT service: Copy Ask AI pipeline = Pipeline( [ transport.input(), # Transport user input stt_mute_processor, # Add the mute processor before STT stt, # Speech-to-text service context_aggregator.user(), # User responses llm, # LLM tts, # Text-to-speech transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) ​ Best Practices Place the filter correctly : Always position STTMuteFilter between transport input and STT Choose strategies wisely : Select the minimal set of strategies needed for your use case Test user experience : Excessive muting can frustrate users; balance control with usability Consider feedback : Provide visual cues when the user is muted to improve the experience ​ Next Steps Try the STTMuteFilter Example Explore a complete working example that demonstrates how to use STTMuteFilter to control user input during bot speech and function calls. STTMuteFilter Reference Read the complete API reference documentation for advanced configuration options and muting strategies. Experiment with different muting strategies to find the right balance for your application. For advanced scenarios, try implementing custom muting logic based on specific conversation states or content. Function Calling Recording Audio On this page Overview When to Use STTMuteFilter How It Works Mute Strategies Basic Implementation Step 1: Configure the Filter Step 2: Add to Your Pipeline Best Practices Next Steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/getting-started_core-concepts_4fe106bf.txt b/getting-started_core-concepts_4fe106bf.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d8deb13623ea496fc52b3ddc10af26bac53a1c22
--- /dev/null
+++ b/getting-started_core-concepts_4fe106bf.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/getting-started/core-concepts#3-pipelines
+Title: Core Concepts - Pipecat
+==================================================
+
+Core Concepts - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Core Concepts Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat uses a pipeline-based architecture to handle real-time AI processing. Let’s look at how this works in practice, then break down the key components. ​ Real-time Processing in Action Consider how a voice assistant processes a user’s question and generates a response: Instead of waiting for complete responses at each step, Pipecat processes data in small units called frames that flow through the pipeline: Speech is transcribed in real-time as the user speaks Transcription is sent to the LLM as it becomes available LLM responses are processed as they stream in Text-to-speech begins generating audio for early sentences while later ones are still being generated Audio playback starts as soon as the first sentence is ready LLM context is aggregated and updated continuously and in real-time This streaming approach creates natural, responsive interactions. ​ Architecture Overview Here’s how Pipecat organizes these processes: The architecture consists of three key components: ​ 1. Frames Frames are containers for data moving through your application. Think of them like packages on a conveyor belt - each contains a specific type of cargo. For example: Audio data from a microphone Text from transcription LLM responses Generated speech audio Images or video Control signals and system messages Frames can flow in two directions: Downstream (normal processing flow) Upstream (for errors and control signals) ​ 2. Processors (Services) Processors are workers along our conveyor belt. Each one: Receives frames as inputs Processes specific frame types Passes through frames it doesn’t handle Generates new frames as output Frame processors can do anything, but for real-time, multimodal AI applications, they commonly include: A speech-to-text processor that receives raw audio input frames and outputs transcription frames An LLM processor takes context frames and produces text frames A text-to-speech processor that receives text frames and generates raw audio output frames An image generation processor that takes in text frames and outputs an image URL frame A logging processor might watch all frames but not modify them ​ 3. Pipelines Pipelines connect processors together, creating a path for frames to flow through your application. They can be: Copy Ask AI # Simple linear pipeline pipeline = Pipeline([ transport.input() # Speech -> Audio stt, # Audio -> Text llm, # Text -> Response tts, # Response -> Audio transport.output() # Audio -> Playback ]) # Complex parallel pipeline pipeline = Pipeline([ input_source, ParallelPipeline([ [image_processor, image_output], # Handle images [audio_processor, audio_output] # Handle audio ]) ]) The pipeline also contains the transport, which is the connection to the real world (e.g., microphone, speakers). ​ How It All Works Together Let’s see how these components handle a simple voice interaction: Input User speaks into their microphone Transport converts audio into frames Frames enter the pipeline Processing Transcription processor converts speech to text frames LLM processor takes text frames, generates response frames TTS processor converts response frames to audio frames Error frames flow upstream if issues occur System frames can bypass normal processing for immediate handling Output Audio frames reach the transport Transport plays the audio for the user This happens continuously and in parallel, creating smooth, real-time interactions. ​ Next Steps Now that you understand the big picture, the next step is to install and run your first Pipecat application. Check out the Installation & Setup guide to get started. Need help? Join our Discord community for support and discussions. Quickstart Next Steps & Examples On this page Real-time Processing in Action Architecture Overview 1. Frames 2. Processors (Services) 3. Pipelines How It All Works Together Next Steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/getting-started_core-concepts_b500bcba.txt b/getting-started_core-concepts_b500bcba.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7097af2fe2efa749506fbad8bd3a6050a63bc646
--- /dev/null
+++ b/getting-started_core-concepts_b500bcba.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/getting-started/core-concepts#2-processors-services
+Title: Core Concepts - Pipecat
+==================================================
+
+Core Concepts - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Core Concepts Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat uses a pipeline-based architecture to handle real-time AI processing. Let’s look at how this works in practice, then break down the key components. ​ Real-time Processing in Action Consider how a voice assistant processes a user’s question and generates a response: Instead of waiting for complete responses at each step, Pipecat processes data in small units called frames that flow through the pipeline: Speech is transcribed in real-time as the user speaks Transcription is sent to the LLM as it becomes available LLM responses are processed as they stream in Text-to-speech begins generating audio for early sentences while later ones are still being generated Audio playback starts as soon as the first sentence is ready LLM context is aggregated and updated continuously and in real-time This streaming approach creates natural, responsive interactions. ​ Architecture Overview Here’s how Pipecat organizes these processes: The architecture consists of three key components: ​ 1. Frames Frames are containers for data moving through your application. Think of them like packages on a conveyor belt - each contains a specific type of cargo. For example: Audio data from a microphone Text from transcription LLM responses Generated speech audio Images or video Control signals and system messages Frames can flow in two directions: Downstream (normal processing flow) Upstream (for errors and control signals) ​ 2. Processors (Services) Processors are workers along our conveyor belt. Each one: Receives frames as inputs Processes specific frame types Passes through frames it doesn’t handle Generates new frames as output Frame processors can do anything, but for real-time, multimodal AI applications, they commonly include: A speech-to-text processor that receives raw audio input frames and outputs transcription frames An LLM processor takes context frames and produces text frames A text-to-speech processor that receives text frames and generates raw audio output frames An image generation processor that takes in text frames and outputs an image URL frame A logging processor might watch all frames but not modify them ​ 3. Pipelines Pipelines connect processors together, creating a path for frames to flow through your application. They can be: Copy Ask AI # Simple linear pipeline pipeline = Pipeline([ transport.input() # Speech -> Audio stt, # Audio -> Text llm, # Text -> Response tts, # Response -> Audio transport.output() # Audio -> Playback ]) # Complex parallel pipeline pipeline = Pipeline([ input_source, ParallelPipeline([ [image_processor, image_output], # Handle images [audio_processor, audio_output] # Handle audio ]) ]) The pipeline also contains the transport, which is the connection to the real world (e.g., microphone, speakers). ​ How It All Works Together Let’s see how these components handle a simple voice interaction: Input User speaks into their microphone Transport converts audio into frames Frames enter the pipeline Processing Transcription processor converts speech to text frames LLM processor takes text frames, generates response frames TTS processor converts response frames to audio frames Error frames flow upstream if issues occur System frames can bypass normal processing for immediate handling Output Audio frames reach the transport Transport plays the audio for the user This happens continuously and in parallel, creating smooth, real-time interactions. ​ Next Steps Now that you understand the big picture, the next step is to install and run your first Pipecat application. Check out the Installation & Setup guide to get started. Need help? Join our Discord community for support and discussions. Quickstart Next Steps & Examples On this page Real-time Processing in Action Architecture Overview 1. Frames 2. Processors (Services) 3. Pipelines How It All Works Together Next Steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/getting-started_installation_5d83c201.txt b/getting-started_installation_5d83c201.txt
new file mode 100644
index 0000000000000000000000000000000000000000..36735612c62063859aa5f27b81e6d7fb845c2caa
--- /dev/null
+++ b/getting-started_installation_5d83c201.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/getting-started/installation#create-a-virtual-environment
+Title: Installation & Setup - Pipecat
+==================================================
+
+Installation & Setup - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Installation & Setup Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples ​ Prerequisites Pipecat requires Python 3.10 or higher. To check your Python version: Copy Ask AI python --version ​ Accounts you’ll need While you can use Pipecat without third-party services, integrating with the following providers will enhance your bot’s capabilities. Here’s what we recommend for getting started: ​ Essential services for the quickstart Cartesia Text-to-Speech : Required for the quickstart. Converts text to natural-sounding speech. ​ Recommended services for production Daily Transport : Handles real-time audio, video, and data exchange between bot and user. Recommended for production (our quickstart uses P2P WebRTC). Deepgram Speech-to-Text : Converts audio to text in realtime. OpenAI / Anthropic / Gemini LLM Inference : Generates streaming text responses based on user input. Cartesia / ElevenLabs / Rime Text-to-Speech : Additional TTS options for production use. Explore our full list of supported services for more integration options. ​ Setting up your project ​ Create a virtual environment We recommend using a virtual environment to manage your dependencies: Copy Ask AI mkdir pipecat-project cd pipecat-project python3 -m venv env Activate the virtual environment based on your operating system: macOS/Linux Windows Copy Ask AI source env/bin/activate Copy Ask AI source env/bin/activate Copy Ask AI source env/Scripts/activate # If using Git Bash # OR . \ env \ Scripts \ activate # If using Command Prompt # OR . \ env \ Scripts \ Activate.ps1 # If using PowerShell ​ Install Pipecat The pipecat-ai Python module uses optional dependencies to keep your installation lightweight. This approach lets you include only the specific AI libraries you need for your project. To install Pipecat with support for the recommended services above, use this command: Copy Ask AI pip install "pipecat-ai[webrtc,deepgram,openai,cartesia]" You can add this to your requirements.txt file or include any combination of supported integrations based on your needs. ​ Configure your services ​ Cartesia Text-to-Speech Setup Visit Cartesia’s Voice library Find a voice you like and click the three dots at the end of the row Select Copy ID to get your voice ID Go to Cartesia API keys and create an API key ​ Environment Setup Create a .env file in your project directory with your service credentials: Copy Ask AI CARTESIA_API_KEY=your_cartesia_api_key CARTESIA_VOICE_ID=your_voice_id Make sure to add .env to your .gitignore file to avoid accidentally committing sensitive API keys. ​ Next Steps Build Your First Bot Now that you have everything set up, proceed to the Quickstart guide to build your first Pipecat application. Overview Quickstart On this page Prerequisites Accounts you’ll need Essential services for the quickstart Recommended services for production Setting up your project Create a virtual environment Install Pipecat Configure your services Cartesia Text-to-Speech Setup Environment Setup Next Steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/getting-started_installation_6d5bb295.txt b/getting-started_installation_6d5bb295.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c57f25223e89c06fa698b1add3c8f803b956f6e0
--- /dev/null
+++ b/getting-started_installation_6d5bb295.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/getting-started/installation#cartesia-text-to-speech-setup
+Title: Installation & Setup - Pipecat
+==================================================
+
+Installation & Setup - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Installation & Setup Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples ​ Prerequisites Pipecat requires Python 3.10 or higher. To check your Python version: Copy Ask AI python --version ​ Accounts you’ll need While you can use Pipecat without third-party services, integrating with the following providers will enhance your bot’s capabilities. Here’s what we recommend for getting started: ​ Essential services for the quickstart Cartesia Text-to-Speech : Required for the quickstart. Converts text to natural-sounding speech. ​ Recommended services for production Daily Transport : Handles real-time audio, video, and data exchange between bot and user. Recommended for production (our quickstart uses P2P WebRTC). Deepgram Speech-to-Text : Converts audio to text in realtime. OpenAI / Anthropic / Gemini LLM Inference : Generates streaming text responses based on user input. Cartesia / ElevenLabs / Rime Text-to-Speech : Additional TTS options for production use. Explore our full list of supported services for more integration options. ​ Setting up your project ​ Create a virtual environment We recommend using a virtual environment to manage your dependencies: Copy Ask AI mkdir pipecat-project cd pipecat-project python3 -m venv env Activate the virtual environment based on your operating system: macOS/Linux Windows Copy Ask AI source env/bin/activate Copy Ask AI source env/bin/activate Copy Ask AI source env/Scripts/activate # If using Git Bash # OR . \ env \ Scripts \ activate # If using Command Prompt # OR . \ env \ Scripts \ Activate.ps1 # If using PowerShell ​ Install Pipecat The pipecat-ai Python module uses optional dependencies to keep your installation lightweight. This approach lets you include only the specific AI libraries you need for your project. To install Pipecat with support for the recommended services above, use this command: Copy Ask AI pip install "pipecat-ai[webrtc,deepgram,openai,cartesia]" You can add this to your requirements.txt file or include any combination of supported integrations based on your needs. ​ Configure your services ​ Cartesia Text-to-Speech Setup Visit Cartesia’s Voice library Find a voice you like and click the three dots at the end of the row Select Copy ID to get your voice ID Go to Cartesia API keys and create an API key ​ Environment Setup Create a .env file in your project directory with your service credentials: Copy Ask AI CARTESIA_API_KEY=your_cartesia_api_key CARTESIA_VOICE_ID=your_voice_id Make sure to add .env to your .gitignore file to avoid accidentally committing sensitive API keys. ​ Next Steps Build Your First Bot Now that you have everything set up, proceed to the Quickstart guide to build your first Pipecat application. Overview Quickstart On this page Prerequisites Accounts you’ll need Essential services for the quickstart Recommended services for production Setting up your project Create a virtual environment Install Pipecat Configure your services Cartesia Text-to-Speech Setup Environment Setup Next Steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/getting-started_next-steps_f1527af0.txt b/getting-started_next-steps_f1527af0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6a906ecc76d3e1f0c8b145efcba39b7a0c1617f4
--- /dev/null
+++ b/getting-started_next-steps_f1527af0.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/_sites/docs.pipecat.ai/getting-started/next-steps#real-time-processing
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/getting-started_overview_d9be5516.txt b/getting-started_overview_d9be5516.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2d843338e8da7dcd8ebff809d171b6a2a44b1722
--- /dev/null
+++ b/getting-started_overview_d9be5516.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/getting-started/overview#how-it-works
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/getting-started_quickstart_bbc16cb5.txt b/getting-started_quickstart_bbc16cb5.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d9f97b7199caceabf8891ce416504b02586af9bb
--- /dev/null
+++ b/getting-started_quickstart_bbc16cb5.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/getting-started/quickstart#create-a-project-directory
+Title: Quickstart - Pipecat
+==================================================
+
+Quickstart - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Quickstart Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples In this quickstart, we’ll create a simple conversational bot that greets users when they join and exits when they leave. This example demonstrates the core components of a Pipecat application with a streamlined setup. ​ Set up your project ​ Create a project directory Copy Ask AI mkdir pipecat-quickstart && cd pipecat-quickstart ​ Set up a virtual environment macOS/Linux Windows Copy Ask AI python3 -m venv env source env/bin/activate Copy Ask AI python3 -m venv env source env/bin/activate Copy Ask AI python -m venv env source env/Scripts/activate # If using Git Bash # OR . \ env \ Scripts \ activate # If using Command Prompt # OR . \ env \ Scripts \ Activate.ps1 # If using PowerShell ​ Download the example files Linux/macOS Windows (PowerShell) Manual Download Copy Ask AI curl -O https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/01-say-one-thing.py curl -O https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/requirements.txt Copy Ask AI curl -O https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/01-say-one-thing.py curl -O https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/requirements.txt Copy Ask AI curl.exe - O https: // raw.githubusercontent.com / pipecat - ai / pipecat / main / examples / foundational / 01 - say - one - thing.py curl.exe - O https: // raw.githubusercontent.com / pipecat - ai / pipecat / main / examples / foundational / requirements.txt Download these files and save them to your project directory: 01-say-one-thing.py requirements.txt ​ Install dependencies Copy Ask AI pip install -r requirements.txt ​ Configure the environment Create a .env file with your Cartesia API key: macOS/Linux Windows (Command Prompt) Manual Method Copy Ask AI echo "CARTESIA_API_KEY=your_cartesia_api_key" > .env Copy Ask AI echo "CARTESIA_API_KEY=your_cartesia_api_key" > .env Copy Ask AI echo CARTESIA_API_KEY=your_cartesia_api_key > .env Create a file named .env in your project directory and add: Copy Ask AI CARTESIA_API_KEY=your_cartesia_api_key Replace your_cartesia_api_key with the actual API key you created during the installation step . ​ Run the example Start the bot with this command: Copy Ask AI python 01-say-one-thing.py You’ll see a URL (typically http://localhost:7860 ) in the console output. Open this URL in your browser to join the session. The bot will automatically join, greet you, and then disconnect. ​ Understanding the code Let’s examine the key components of 01-say-one-thing.py : Copy Ask AI # The `run_example()` function receives a transport which can be a # SmallWebRTCTransport, a DailyTransport or a FastAPIWebsocketTransport. The # global variable `transport_params` defines the transport settings, in this case # `audio_out_enabled=True` which means the example will send audio out (and you # will hear it). # Initialize Cartesia's text-to-speech service # Using a pre-selected British female voice # You can find other voices at https://play.cartesia.ai/voices tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "79a125e8-cd45-4c13-8a67-188112f4dd22" , # British Reading Lady ) # Create PipelineTask containing the Pipeline # The Pipeline processes text into speech and play it # 1. tts: Converts text messages into audio # 2. transport.output(): Sends audio into the WebRTC session task = PipelineTask(Pipeline([tts, transport.output()])) # When the client connects: # - Generate a greeting message # - Queue it for text-to-speech conversion # - Queue an EndFrame to shut down the pipeline when done @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): await task.queue_frames([TTSSpeakFrame( f "Hello there!" ), EndFrame()]) # Create a pipeline runner and run the pipeline task runner = PipelineRunner( handle_sigint = False ) await runner.run(task) ​ Customize the example Try these simple modifications to enhance your bot: Change the voice Visit Cartesia’s voice library to find a different voice. Then update the voice_id parameter: Copy Ask AI tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your_new_voice_id" , # Replace with a new voice ID ) Modify the greeting Change what the bot says when someone connects: Copy Ask AI await task.queue_frames([ TTSSpeakFrame( f "Welcome to Pipecat! I'm your virtual assistant." ), EndFrame() ]) Add a farewell message Add an additional message before ending: Copy Ask AI await task.queue_frames([ TTSSpeakFrame( f "Hello there!" ), TTSSpeakFrame( f "I hope you enjoy using Pipecat. Goodbye!" ), EndFrame() ]) ​ Next steps Try more examples Explore our GitHub repository with more sophisticated examples Core concepts Deepen your understanding of Pipecat’s architecture and components Supported services Integrate additional AI services like OpenAI, Anthropic, or Deepgram Join our community Get help, share projects, and connect with other Pipecat developers Installation & Setup Core Concepts On this page Set up your project Create a project directory Set up a virtual environment Download the example files Install dependencies Configure the environment Run the example Understanding the code Customize the example Next steps Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/guides_introduction_21b5d109.txt b/guides_introduction_21b5d109.txt
new file mode 100644
index 0000000000000000000000000000000000000000..85f16cc46941488bed064aeb631c4c9885ba4c8e
--- /dev/null
+++ b/guides_introduction_21b5d109.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/introduction#telephony
+Title: Guides - Pipecat
+==================================================
+
+Guides - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Guides Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal These guides cover key aspects of building and deploying Pipecat applications. Choose a guide based on what you want to accomplish: ​ Features OpenAI Audio Models and APIs Build voice agents with OpenAI audio models Gemini Multimodal Live Build real-time AI chatbots with Gemini Function Calling Implement custom functions in your bot Metrics & Monitoring Track and monitor your application Noise Reduction Improve audio quality with Krisp Pipecat Flows Build structured conversation flows ​ Telephony Overview Introduction to voice and telephony features WebRTC with Daily Implement dial-in using Daily’s WebRTC Twilio + Daily Integration Combine Twilio and Daily for advanced telephony WebSockets with Twilio Using WebSockets for Twilio integration Dialout Capabilities Enable outbound calling with Daily ​ Deployment Overview Learn the basics of deploying Pipecat applications Deployment Patterns Common architectures and deployment strategies Deploying to Pipecat Cloud Step-by-step guide for deploying to Pipecat Cloud Deploying to Fly.io Step-by-step guide for deploying to Fly.io Deploying to Cerebrium Deploy your application on Cerebrium ​ New to Pipecat? Get started Check out our Getting Started guide to build your first application Context Management On this page Features Telephony Deployment New to Pipecat? Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/guides_introduction_cbc1fd94.txt b/guides_introduction_cbc1fd94.txt
new file mode 100644
index 0000000000000000000000000000000000000000..39d01c5fca9c9820e9bfabfc4bcb09904053ad9f
--- /dev/null
+++ b/guides_introduction_cbc1fd94.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/introduction
+Title: Guides - Pipecat
+==================================================
+
+Guides - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Guides Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal These guides cover key aspects of building and deploying Pipecat applications. Choose a guide based on what you want to accomplish: ​ Features OpenAI Audio Models and APIs Build voice agents with OpenAI audio models Gemini Multimodal Live Build real-time AI chatbots with Gemini Function Calling Implement custom functions in your bot Metrics & Monitoring Track and monitor your application Noise Reduction Improve audio quality with Krisp Pipecat Flows Build structured conversation flows ​ Telephony Overview Introduction to voice and telephony features WebRTC with Daily Implement dial-in using Daily’s WebRTC Twilio + Daily Integration Combine Twilio and Daily for advanced telephony WebSockets with Twilio Using WebSockets for Twilio integration Dialout Capabilities Enable outbound calling with Daily ​ Deployment Overview Learn the basics of deploying Pipecat applications Deployment Patterns Common architectures and deployment strategies Deploying to Pipecat Cloud Step-by-step guide for deploying to Pipecat Cloud Deploying to Fly.io Step-by-step guide for deploying to Fly.io Deploying to Cerebrium Deploy your application on Cerebrium ​ New to Pipecat? Get started Check out our Getting Started guide to build your first application Context Management On this page Features Telephony Deployment New to Pipecat? Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/homepage_78a3623c.txt b/homepage_78a3623c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cf322316f447162aa3203cf2020316a5cf3ebf6d
--- /dev/null
+++ b/homepage_78a3623c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai#next-steps
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/homepage_bf18282e.txt b/homepage_bf18282e.txt
new file mode 100644
index 0000000000000000000000000000000000000000..59db4b694434ffa6c676bfaf30b3d233048cf652
--- /dev/null
+++ b/homepage_bf18282e.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/#real-time-processing
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/image-generation_fal_2e21d8c6.txt b/image-generation_fal_2e21d8c6.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8404f4cc4c8d6fe9152f0d5a57087c2ac111e799
--- /dev/null
+++ b/image-generation_fal_2e21d8c6.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/image-generation/fal#methods
+Title: fal - Pipecat
+==================================================
+
+fal - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Image Generation fal Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation fal Google Imagen OpenAI Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview FalImageGenService provides high-speed image generation capabilities using fal’s optimized Stable Diffusion XL models. It supports various image sizes, formats, and generation parameters with a focus on fast inference. ​ Installation To use FalImageGenService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[fal]" You’ll also need to set up your Fal API key as an environment variable: FAL_KEY You can obtain a fal API key by signing up at fal . ​ Configuration ​ Constructor Parameters ​ params InputParams required Generation parameters configuration ​ aiohttp_session aiohttp.ClientSession required HTTP session for image downloading ​ model str default: "fal-ai/fast-sdxl" Model identifier ​ key str Fal API key (alternative to environment variable) ​ Input Parameters Copy Ask AI class InputParams ( BaseModel ): seed: Optional[ int ] = None # Random seed for reproducibility num_inference_steps: int = 8 # Number of denoising steps num_images: int = 1 # Number of images to generate image_size: Union[ str , Dict[ str , int ]] = "square_hd" # Image dimensions expand_prompt: bool = False # Enhance prompt automatically enable_safety_checker: bool = True # Filter unsafe content format : str = "png" # Output image format ​ Supported Image Sizes Possible enum values: square_hd , square , portrait_4_3 , portrait_16_9 , landscape_4_3 , landscape_16_9 Note: For custom image sizes, you can pass the width and height as an object: Copy Ask AI { "image_size" : { "width" : 1280 , "height" : 720 } } See the fal docs for more information. ​ Output Frames ​ URLImageRawFrame ​ url string Generated image URL ​ image bytes Raw image data ​ size tuple Image dimensions (width, height) ​ format string Image format (e.g., ‘PNG’) ​ ErrorFrame ​ error string Error information if generation fails ​ Methods See the Image Generation base class methods for additional functionality. ​ Usage Example Copy Ask AI import aiohttp from pipecat.services.fal.image import FalImageGenService # Configure service async with aiohttp.ClientSession() as session: service = FalImageGenService( model = "fal-ai/fast-sdxl" , aiohttp_session = session, params = FalImageGenService.InputParams( num_inference_steps = 8 , image_size = "portrait_hd" , expand_prompt = True ) ) # Use in pipeline pipeline = Pipeline([ prompt_input, # Produces text prompts service, # Generates images image_handler # Handles generated images ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Generation time Download time API response time Total processing duration ​ Notes Fast inference times with optimized models Supports various image sizes and formats Automatic prompt enhancement option Built-in safety filtering Asynchronous operation Efficient HTTP session management Comprehensive error handling OpenAI Realtime Beta Google Imagen On this page Overview Installation Configuration Constructor Parameters Input Parameters Supported Image Sizes Output Frames URLImageRawFrame ErrorFrame Methods Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/image-generation_google-imagen_a07900bf.txt b/image-generation_google-imagen_a07900bf.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9cc6d269e1c51ecd401d85d410cdc2f6881d1b1a
--- /dev/null
+++ b/image-generation_google-imagen_a07900bf.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/image-generation/google-imagen#param-image
+Title: Google Imagen - Pipecat
+==================================================
+
+Google Imagen - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Image Generation Google Imagen Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation fal Google Imagen OpenAI Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GoogleImageGenService provides high-quality image generation capabilities using Google’s Imagen models. It supports generating multiple images from text prompts with various customization options. ​ Installation To use GoogleImageGenService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll also need to set up your Google API key as an environment variable: GOOGLE_API_KEY ​ Configuration ​ Constructor Parameters ​ params InputParams default: "InputParams()" Generation parameters configuration ​ api_key str required Google API key for authentication ​ Input Parameters ​ number_of_images int default: "1" Number of images to generate (1-8) ​ model str default: "imagen-3.0-generate-002" Model identifier ​ negative_prompt str default: "None" Elements to exclude from generation ​ Input The service accepts text prompts through its image generation pipeline. ​ Output Frames ​ URLImageRawFrame ​ url string Generated image URL (null for Google implementation as it returns raw bytes) ​ image bytes Raw image data ​ size tuple Image dimensions (width, height) ​ format string Image format (e.g., ‘JPEG’) ​ ErrorFrame ​ error string Error information if generation fails ​ Usage Example Copy Ask AI from pipecat.services.google.image import GoogleImageGenService # Configure service image_gen = GoogleImageGenService( api_key = "your-google-api-key" , params = GoogleImageGenService.InputParams( number_of_images = 2 , model = "imagen-3.0-generate-002" , negative_prompt = "blurry, distorted, low quality" ) ) # Use in pipeline main_pipeline = Pipeline( [ transport.input(), context_aggregator.user(), llm_service, image_gen, tts_service, transport.output(), context_aggregator.assistant(), ] ) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Processing duration API response metrics ​ Model Support Google’s Imagen service offers different model variants: Model ID Description imagen-3.0-generate-002 Latest Imagen model with high-quality outputs See other available models in Google’s Imagen documentation . ​ Error Handling Copy Ask AI try : async for frame in service.run_image_gen(prompt): if isinstance (frame, ErrorFrame): handle_error(frame.error) except Exception as e: logger.error( f "Image generation error: { e } " ) fal OpenAI On this page Overview Installation Configuration Constructor Parameters Input Parameters Input Output Frames URLImageRawFrame ErrorFrame Usage Example Frame Flow Metrics Support Model Support Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/image-generation_openai_c59633c7.txt b/image-generation_openai_c59633c7.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d576cc1420312f693faaee16173c7c0526b8b9c4
--- /dev/null
+++ b/image-generation_openai_c59633c7.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/image-generation/openai#output-frames
+Title: OpenAI Image Generation - Pipecat
+==================================================
+
+OpenAI Image Generation - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Image Generation OpenAI Image Generation Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation fal Google Imagen OpenAI Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenAIImageGenService provides high-quality image generation capabilities using OpenAI’s DALL-E models. It transforms text prompts into images with various size options and model configurations. ​ Installation No additional installation is required for the OpenAIImageGenService as it is part of the Pipecat AI package. You’ll also need an OpenAI API key for authentication. ​ Configuration ​ Constructor Parameters ​ api_key str required OpenAI API key for authentication ​ base_url str default: "None" Optional base URL for OpenAI API requests ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ image_size str required Image dimensions - one of “256x256”, “512x512”, “1024x1024”, “1792x1024”, “1024x1792” ​ model str default: "dall-e-3" OpenAI model identifier for image generation ​ Input The service accepts text prompts through its image generation pipeline. ​ Output Frames ​ URLImageRawFrame ​ url string Generated image URL from OpenAI ​ image bytes Raw image data ​ size tuple Image dimensions (width, height) ​ format string Image format (e.g., ‘JPEG’) ​ ErrorFrame ​ error string Error information if generation fails ​ Usage Example Copy Ask AI import aiohttp from pipecat.pipeline.pipeline import Pipeline from pipecat.services.openai.image import OpenAIImageGenService # Create an aiohttp session aiohttp_session = aiohttp.ClientSession() # Configure service image_gen = OpenAIImageGenService( api_key = "your-openai-api-key" , aiohttp_session = aiohttp_session, image_size = "1024x1024" , model = "dall-e-3" ) # Use in pipeline main_pipeline = Pipeline( [ transport.input(), context_aggregator.user(), llm_service, image_gen, tts_service, transport.output(), context_aggregator.assistant(), ] ) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Processing duration API response metrics ​ Model Support OpenAI’s image generation service offers different model variants: Model ID Description dall-e-3 Latest DALL-E model with higher quality and better prompt following dall-e-2 Previous generation model with good quality and lower cost ​ Image Size Options Size Option Aspect Ratio Description 256x256 1:1 Small square image 512x512 1:1 Medium square image 1024x1024 1:1 Large square image 1792x1024 16:9 Horizontal/landscape orientation 1024x1792 9:16 Vertical/portrait orientation ​ Error Handling Copy Ask AI try : async for frame in image_gen.run_image_gen(prompt): if isinstance (frame, ErrorFrame): logger.error( f "Image generation error: { frame.error } " ) else : # Process successful image generation pass except Exception as e: logger.error( f "Unexpected error during image generation: { e } " ) Google Imagen Simli On this page Overview Installation Configuration Constructor Parameters Input Output Frames URLImageRawFrame ErrorFrame Usage Example Frame Flow Metrics Support Model Support Image Size Options Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/ios_api-reference_3a283b76.txt b/ios_api-reference_3a283b76.txt
new file mode 100644
index 0000000000000000000000000000000000000000..191c3d62e0ef66717ae9fe0bb730de1b6c97af90
--- /dev/null
+++ b/ios_api-reference_3a283b76.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/ios/api-reference
+Title: RTVI Documentation
+==================================================
+
+RTVI Documentation ☰ PipecatClientIOS PipecatClientIOSDaily PipecatClientIOSGeminiLiveWebSocket PipecatClientIOSOpenAIRealtimeWebrtc
\ No newline at end of file
diff --git a/links_server-reference_d61ae386.txt b/links_server-reference_d61ae386.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bac6f8b4aa81447ccb4b4fe873825eec268297d1
--- /dev/null
+++ b/links_server-reference_d61ae386.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/links/server-reference#quick-links
+Title: Pipecat API Reference — pipecat-ai  documentation
+==================================================
+
+Pipecat API Reference — pipecat-ai documentation Pipecat API Reference View page source Pipecat API Reference  Welcome to the Pipecat API reference. Use the navigation on the left to browse modules, or search using the search box. New to Pipecat? Check out the main documentation for tutorials, guides, and client SDK information. Quick Links  GitHub Repository Join our Community
\ No newline at end of file
diff --git a/llm_anthropic_5468babc.txt b/llm_anthropic_5468babc.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cf09c971df1357344928b6290dbf4ddba14d0059
--- /dev/null
+++ b/llm_anthropic_5468babc.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/anthropic#overview
+Title: Anthropic - Pipecat
+==================================================
+
+Anthropic - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Anthropic Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AnthropicLLMService provides integration with Anthropic’s Claude models, supporting streaming responses, function calling, and prompt caching with specialized context handling for Anthropic’s message format. API Reference Complete API documentation and method details Anthropic Docs Official Anthropic API documentation and features Example Code Working example with function calling ​ Installation To use Anthropic services, install the required dependency: Copy Ask AI pip install "pipecat-ai[anthropic]" You’ll also need to set up your Anthropic API key as an environment variable: ANTHROPIC_API_KEY . Get your API key from Anthropic Console . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates LLMEnablePromptCachingFrame - Toggle prompt caching ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.anthropic.llm import AnthropicLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = AnthropicLLMService( api_key = os.getenv( "ANTHROPIC_API_KEY" ), model = "claude-sonnet-4-20250514" , params = AnthropicLLMService.InputParams( temperature = 0.7 , enable_prompt_caching_beta = True ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" } }, required = [ "location" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context with system message context = OpenAILLMContext( messages = [{ "role" : "user" , "content" : "What's the weather like?" }], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def get_weather ( params ): location = params.arguments[ "location" ] await params.result_callback( f "Weather in { location } : 72°F and sunny" ) llm.register_function( "get_weather" , get_weather) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # Handles user messages llm, # Processes with Anthropic tts, transport.output(), context_aggregator.assistant() # Captures responses ]) ​ Metrics The service provides: Time to First Byte (TTFB) - Latency from request to first response token Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and total usage Cache Metrics - Cache creation and read token usage Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes Streaming Responses : All responses are streamed for low latency Context Persistence : Use context aggregators to maintain conversation history Error Handling : Automatic retry logic for rate limits and transient errors Message Format : Automatically converts between OpenAI and Anthropic message formats Prompt Caching : Reduces costs and latency for repeated context patterns Whisper AWS Bedrock On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_aws_47401ae3.txt b/llm_aws_47401ae3.txt
new file mode 100644
index 0000000000000000000000000000000000000000..962938b7429672d56fa71e561fccda01300fc372
--- /dev/null
+++ b/llm_aws_47401ae3.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/aws#context-management
+Title: AWS Bedrock - Pipecat
+==================================================
+
+AWS Bedrock - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM AWS Bedrock Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AWS Bedrock LLM service provides access to Amazon’s foundation models including Anthropic Claude and Amazon Nova, with streaming responses, function calling, and multimodal capabilities through Amazon’s managed AI service. API Reference Complete API documentation and method details AWS Bedrock Docs Official AWS Bedrock documentation and features Example Code Working example with function calling ​ Installation To use AWS Bedrock services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) Set up an IAM user with Amazon Bedrock access in your AWS account to obtain credentials. ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.aws.llm import AWSBedrockLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = AWSBedrockLLMService( aws_region = "us-west-2" , model = "us.anthropic.claude-3-5-haiku-20241022-v1:0" , params = AWSBedrockLLMService.InputParams( temperature = 0.7 , ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Register function handler async def get_current_weather ( params ): location = params.arguments[ "location" ] format_type = params.arguments[ "format" ] result = { "conditions" : "sunny" , "temperature" : "75" , "unit" : format_type} await params.result_callback(result) llm.register_function( "get_current_weather" , get_current_weather) # Create context with system message messages = [ { "role" : "system" , "content" : "You are a helpful assistant with access to weather information." } ] context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # Handles user messages llm, # Processes with AWS Bedrock tts, transport.output(), context_aggregator.assistant() # Captures responses ]) ​ Metrics The service provides comprehensive AWS Bedrock metrics: Time to First Byte (TTFB) - Latency from request to first response token Processing Duration - Total request processing time Token Usage - Input tokens, output tokens, and total usage Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes Streaming Responses : All responses are streamed for low latency Context Persistence : Use context aggregators to maintain conversation history Error Handling : Automatic retry logic for rate limits and transient errors Message Format : Automatically converts between OpenAI and AWS Bedrock message formats Performance Modes : Choose “standard” or “optimized” latency based on your needs Regional Availability : Different models available in different AWS regions Vision Support : Image processing available with compatible models like Claude 3 Anthropic Azure On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_aws_584d57ed.txt b/llm_aws_584d57ed.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bd4dd83e4086880d19ee41c3a0803512be40ae15
--- /dev/null
+++ b/llm_aws_584d57ed.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/aws#metrics
+Title: AWS Bedrock - Pipecat
+==================================================
+
+AWS Bedrock - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM AWS Bedrock Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AWS Bedrock LLM service provides access to Amazon’s foundation models including Anthropic Claude and Amazon Nova, with streaming responses, function calling, and multimodal capabilities through Amazon’s managed AI service. API Reference Complete API documentation and method details AWS Bedrock Docs Official AWS Bedrock documentation and features Example Code Working example with function calling ​ Installation To use AWS Bedrock services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) Set up an IAM user with Amazon Bedrock access in your AWS account to obtain credentials. ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.aws.llm import AWSBedrockLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = AWSBedrockLLMService( aws_region = "us-west-2" , model = "us.anthropic.claude-3-5-haiku-20241022-v1:0" , params = AWSBedrockLLMService.InputParams( temperature = 0.7 , ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Register function handler async def get_current_weather ( params ): location = params.arguments[ "location" ] format_type = params.arguments[ "format" ] result = { "conditions" : "sunny" , "temperature" : "75" , "unit" : format_type} await params.result_callback(result) llm.register_function( "get_current_weather" , get_current_weather) # Create context with system message messages = [ { "role" : "system" , "content" : "You are a helpful assistant with access to weather information." } ] context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # Handles user messages llm, # Processes with AWS Bedrock tts, transport.output(), context_aggregator.assistant() # Captures responses ]) ​ Metrics The service provides comprehensive AWS Bedrock metrics: Time to First Byte (TTFB) - Latency from request to first response token Processing Duration - Total request processing time Token Usage - Input tokens, output tokens, and total usage Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes Streaming Responses : All responses are streamed for low latency Context Persistence : Use context aggregators to maintain conversation history Error Handling : Automatic retry logic for rate limits and transient errors Message Format : Automatically converts between OpenAI and AWS Bedrock message formats Performance Modes : Choose “standard” or “optimized” latency based on your needs Regional Availability : Different models available in different AWS regions Vision Support : Image processing available with compatible models like Claude 3 Anthropic Azure On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_aws_9f3bfa4e.txt b/llm_aws_9f3bfa4e.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0df55677fd7e0a79b4864b8af976570905035f9a
--- /dev/null
+++ b/llm_aws_9f3bfa4e.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/aws
+Title: AWS Bedrock - Pipecat
+==================================================
+
+AWS Bedrock - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM AWS Bedrock Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AWS Bedrock LLM service provides access to Amazon’s foundation models including Anthropic Claude and Amazon Nova, with streaming responses, function calling, and multimodal capabilities through Amazon’s managed AI service. API Reference Complete API documentation and method details AWS Bedrock Docs Official AWS Bedrock documentation and features Example Code Working example with function calling ​ Installation To use AWS Bedrock services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) Set up an IAM user with Amazon Bedrock access in your AWS account to obtain credentials. ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.aws.llm import AWSBedrockLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = AWSBedrockLLMService( aws_region = "us-west-2" , model = "us.anthropic.claude-3-5-haiku-20241022-v1:0" , params = AWSBedrockLLMService.InputParams( temperature = 0.7 , ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Register function handler async def get_current_weather ( params ): location = params.arguments[ "location" ] format_type = params.arguments[ "format" ] result = { "conditions" : "sunny" , "temperature" : "75" , "unit" : format_type} await params.result_callback(result) llm.register_function( "get_current_weather" , get_current_weather) # Create context with system message messages = [ { "role" : "system" , "content" : "You are a helpful assistant with access to weather information." } ] context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # Handles user messages llm, # Processes with AWS Bedrock tts, transport.output(), context_aggregator.assistant() # Captures responses ]) ​ Metrics The service provides comprehensive AWS Bedrock metrics: Time to First Byte (TTFB) - Latency from request to first response token Processing Duration - Total request processing time Token Usage - Input tokens, output tokens, and total usage Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes Streaming Responses : All responses are streamed for low latency Context Persistence : Use context aggregators to maintain conversation history Error Handling : Automatic retry logic for rate limits and transient errors Message Format : Automatically converts between OpenAI and AWS Bedrock message formats Performance Modes : Choose “standard” or “optimized” latency based on your needs Regional Availability : Different models available in different AWS regions Vision Support : Image processing available with compatible models like Claude 3 Anthropic Azure On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_cerebras_06869587.txt b/llm_cerebras_06869587.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f8e9a5cf3671061bd9f8ab352ab07b0264b3230f
--- /dev/null
+++ b/llm_cerebras_06869587.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/cerebras#context-management
+Title: Cerebras - Pipecat
+==================================================
+
+Cerebras - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Cerebras Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview CerebrasLLMService provides access to Cerebras’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Cerebras Docs Official Cerebras inference API documentation Example Code Working example with function calling ​ Installation To use Cerebras services, install the required dependency: Copy Ask AI pip install "pipecat-ai[cerebras]" You’ll also need to set up your Cerebras API key as an environment variable: CEREBRAS_API_KEY . Get your API key from Cerebras Cloud . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.cerebras.llm import CerebrasLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = CerebrasLLMService( api_key = os.getenv( "CEREBRAS_API_KEY" ), model = "llama-3.3-70b" , params = CerebrasLLMService.InputParams( temperature = 0.7 , max_completion_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : "You are a helpful assistant for weather information. Keep responses concise for voice output." } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI-compatible metrics: Time to First Byte (TTFB) - Ultra-low latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API parameters and responses Streaming Responses : All responses are streamed for minimal latency Function Calling : Full support for OpenAI-style tool calling Open Source Models : Access to latest Llama models with commercial licensing Azure DeepSeek On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_cerebras_f42b7d29.txt b/llm_cerebras_f42b7d29.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f40c7889a9042cda962a03158fb8f1f02034b121
--- /dev/null
+++ b/llm_cerebras_f42b7d29.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/cerebras#function-calling
+Title: Cerebras - Pipecat
+==================================================
+
+Cerebras - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Cerebras Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview CerebrasLLMService provides access to Cerebras’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Cerebras Docs Official Cerebras inference API documentation Example Code Working example with function calling ​ Installation To use Cerebras services, install the required dependency: Copy Ask AI pip install "pipecat-ai[cerebras]" You’ll also need to set up your Cerebras API key as an environment variable: CEREBRAS_API_KEY . Get your API key from Cerebras Cloud . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.cerebras.llm import CerebrasLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure the service llm = CerebrasLLMService( api_key = os.getenv( "CEREBRAS_API_KEY" ), model = "llama-3.3-70b" , params = CerebrasLLMService.InputParams( temperature = 0.7 , max_completion_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : "You are a helpful assistant for weather information. Keep responses concise for voice output." } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI-compatible metrics: Time to First Byte (TTFB) - Ultra-low latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API parameters and responses Streaming Responses : All responses are streamed for minimal latency Function Calling : Full support for OpenAI-style tool calling Open Source Models : Access to latest Llama models with commercial licensing Azure DeepSeek On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_google-vertex_a6aa3303.txt b/llm_google-vertex_a6aa3303.txt
new file mode 100644
index 0000000000000000000000000000000000000000..dbf5142f4e54b54b549d084d0711c1b8deb80572
--- /dev/null
+++ b/llm_google-vertex_a6aa3303.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/google-vertex#param-location
+Title: Google Vertex AI - Pipecat
+==================================================
+
+Google Vertex AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Google Vertex AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GoogleVertexLLMService provides access to Google’s language models through Vertex AI while maintaining an OpenAI-compatible interface. It inherits from OpenAILLMService and supports all the features of the OpenAI interface while connecting to Google’s AI services. ​ Installation To use GoogleVertexLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll also need to set up Google Cloud credentials. You can either: Set the GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to your service account JSON file Provide credentials directly to the service constructor ​ Configuration ​ Constructor Parameters ​ credentials Optional[str] JSON string of Google service account credentials ​ credentials_path Optional[str] Path to the Google service account JSON file ​ model str default: "google/gemini-2.0-flash-001" Model identifier ​ params InputParams Vertex AI specific parameters ​ Input Parameters Extends the OpenAI input parameters with Vertex AI specific options: ​ location str default: "us-east4" Google Cloud region where the model is deployed ​ project_id str required Google Cloud project ID Also inherits all OpenAI-compatible parameters: ​ frequency_penalty Optional[float] Reduces likelihood of repeating tokens based on their frequency. Range: [-2.0, 2.0] ​ max_tokens Optional[int] Maximum number of tokens to generate. Must be greater than or equal to 1 ​ presence_penalty Optional[float] Reduces likelihood of repeating any tokens that have appeared. Range: [-2.0, 2.0] ​ temperature Optional[float] Controls randomness in the output. Range: [0.0, 2.0] ​ top_p Optional[float] Controls diversity via nucleus sampling. Range: [0.0, 1.0] ​ Usage Example Copy Ask AI from pipecat.services.google.llm_vertex import GoogleVertexLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask # Configure service llm = GoogleVertexLLMService( credentials_path = "/path/to/service-account.json" , model = "google/gemini-2.0-flash-001" , params = GoogleVertexLLMService.InputParams( project_id = "your-google-cloud-project-id" , location = "us-east4" ) ) # Create context with system message context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : "You are a helpful assistant in a voice conversation. Keep responses concise." } ] ) # Create context aggregator for message handling context_aggregator = llm.create_context_aggregator(context) # Set up pipeline pipeline = Pipeline([ transport.input(), context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) # Create and configure task task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), ) ​ Authentication The service supports multiple authentication methods: Direct credentials string - Pass the JSON credentials as a string to the constructor Credentials file path - Provide a path to the service account JSON file Environment variable - Set GOOGLE_APPLICATION_CREDENTIALS to the path of your service account file The service automatically handles token refresh, with tokens having a 1-hour lifetime. ​ Methods See the LLM base class methods for additional functionality. ​ Function Calling This service supports function calling (also known as tool calling) through the OpenAI-compatible interface, which allows the LLM to request information from external services and APIs. Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Available Models Model Name Description google/gemini-2.0-flash-001 Fast, efficient text generation model google/gemini-2.0-pro-001 Comprehensive, high-quality model google/gemini-1.5-pro-001 Versatile multimodal model google/gemini-1.5-flash-001 Fast, efficient multimodal model See Google Vertex AI documentation for a complete list of supported models and their capabilities. ​ Frame Flow Inherits the OpenAI LLM Service frame flow: ​ Metrics Support The service collects standard LLM metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Function call metrics ​ Notes Uses Google Cloud’s Vertex AI API Maintains OpenAI-compatible interface Supports streaming responses Handles function calling Manages conversation context Includes token usage tracking Thread-safe processing Automatic token refresh Requires Google Cloud project setup Google Gemini Grok On this page Overview Installation Configuration Constructor Parameters Input Parameters Usage Example Authentication Methods Function Calling Available Models Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_grok_8759f9d6.txt b/llm_grok_8759f9d6.txt
new file mode 100644
index 0000000000000000000000000000000000000000..90e6c7eddf672ddf9f6c255115467a497d476a59
--- /dev/null
+++ b/llm_grok_8759f9d6.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/grok#usage-example
+Title: Grok - Pipecat
+==================================================
+
+Grok - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Grok Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GrokLLMService provides access to Grok’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Grok Docs Official Grok API documentation and features Example Code Working example with function calling ​ Installation To use GrokLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[grok]" You’ll also need to set up your Grok API key as an environment variable: GROK_API_KEY . Get your API key from X.AI Console . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.grok.llm import GrokLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Grok service llm = GrokLLMService( api_key = os.getenv( "GROK_API_KEY" ), model = "grok-3-beta" , params = GrokLLMService.InputParams( temperature = 0.8 , # Higher for creative responses max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice interaction context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful and creative assistant in a voice conversation. Your output will be converted to audio, so avoid special characters. Respond in an engaging and helpful way while being succinct.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI metrics capabilities with specialized token tracking: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Accumulated prompt tokens, completion tokens, and totals Grok uses incremental token reporting, so metrics are accumulated and reported at the end of each response. Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Real-time Information : Access to current events and up-to-date information Vision Capabilities : Image understanding and analysis with grok-2-vision model Google Vertex AI Groq On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_grok_9ff7e164.txt b/llm_grok_9ff7e164.txt
new file mode 100644
index 0000000000000000000000000000000000000000..fbd4f1298ae3f04314e988968b800755b87166d6
--- /dev/null
+++ b/llm_grok_9ff7e164.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/grok#metrics
+Title: Grok - Pipecat
+==================================================
+
+Grok - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Grok Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GrokLLMService provides access to Grok’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Grok Docs Official Grok API documentation and features Example Code Working example with function calling ​ Installation To use GrokLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[grok]" You’ll also need to set up your Grok API key as an environment variable: GROK_API_KEY . Get your API key from X.AI Console . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.grok.llm import GrokLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Grok service llm = GrokLLMService( api_key = os.getenv( "GROK_API_KEY" ), model = "grok-3-beta" , params = GrokLLMService.InputParams( temperature = 0.8 , # Higher for creative responses max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice interaction context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful and creative assistant in a voice conversation. Your output will be converted to audio, so avoid special characters. Respond in an engaging and helpful way while being succinct.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI metrics capabilities with specialized token tracking: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Accumulated prompt tokens, completion tokens, and totals Grok uses incremental token reporting, so metrics are accumulated and reported at the end of each response. Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Real-time Information : Access to current events and up-to-date information Vision Capabilities : Image understanding and analysis with grok-2-vision model Google Vertex AI Groq On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_nim_982a961e.txt b/llm_nim_982a961e.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9719dcf4764a41be1ad621bff1e7353e605de522
--- /dev/null
+++ b/llm_nim_982a961e.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/nim#input
+Title: NVIDIA NIM - Pipecat
+==================================================
+
+NVIDIA NIM - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM NVIDIA NIM Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview NimLLMService provides access to NVIDIA’s NIM language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management, with special handling for NVIDIA’s incremental token reporting. API Reference Complete API documentation and method details NVIDIA NIM Docs Official NVIDIA NIM documentation and setup Example Code Working example with function calling ​ Installation To use NVIDIA NIM services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[nim]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . Get your API key from NVIDIA Build . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.nim.llm import NimLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure NVIDIA NIM service llm = NimLLMService( api_key = os.getenv( "NVIDIA_API_KEY" ), model = "nvidia/llama-3.1-nemotron-70b-instruct" , params = NimLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice interactions. Keep responses concise and avoid special characters for better speech synthesis.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Includes specialized token usage tracking for NIM’s incremental reporting: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Tracks tokens used per request, compatible with NIM’s incremental reporting Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters NVIDIA Optimization : Hardware-accelerated inference on NVIDIA infrastructure Token Reporting : Custom handling for NIM’s incremental vs. OpenAI’s final token reporting Model Variety : Access to Nemotron and other NVIDIA-optimized model variants Groq Ollama On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_nim_bc111e74.txt b/llm_nim_bc111e74.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5d8bf4233568c3feb1cf3b9d2b7068bbadfc54a8
--- /dev/null
+++ b/llm_nim_bc111e74.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/nim#usage-example
+Title: NVIDIA NIM - Pipecat
+==================================================
+
+NVIDIA NIM - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM NVIDIA NIM Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview NimLLMService provides access to NVIDIA’s NIM language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management, with special handling for NVIDIA’s incremental token reporting. API Reference Complete API documentation and method details NVIDIA NIM Docs Official NVIDIA NIM documentation and setup Example Code Working example with function calling ​ Installation To use NVIDIA NIM services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[nim]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . Get your API key from NVIDIA Build . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.nim.llm import NimLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure NVIDIA NIM service llm = NimLLMService( api_key = os.getenv( "NVIDIA_API_KEY" ), model = "nvidia/llama-3.1-nemotron-70b-instruct" , params = NimLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice interactions. Keep responses concise and avoid special characters for better speech synthesis.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Includes specialized token usage tracking for NIM’s incremental reporting: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Tracks tokens used per request, compatible with NIM’s incremental reporting Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters NVIDIA Optimization : Hardware-accelerated inference on NVIDIA infrastructure Token Reporting : Custom handling for NIM’s incremental vs. OpenAI’s final token reporting Model Variety : Access to Nemotron and other NVIDIA-optimized model variants Groq Ollama On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_openai_40e8a7ab.txt b/llm_openai_40e8a7ab.txt
new file mode 100644
index 0000000000000000000000000000000000000000..36ec0ef88b46554412a8f58b475d5a581084761d
--- /dev/null
+++ b/llm_openai_40e8a7ab.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/openai#overview
+Title: OpenAI - Pipecat
+==================================================
+
+OpenAI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM OpenAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenAILLMService provides chat completion capabilities using OpenAI’s API, supporting streaming responses, function calling, vision input, and advanced context management for conversational AI applications. API Reference Complete API documentation and method details OpenAI Docs Official OpenAI API documentation Example Code Function calling example with weather API ​ Installation To use OpenAI services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . Get your API key from the OpenAI Platform . ​ Frames ​ Input OpenAILLMContextFrame - OpenAI-specific conversation context LLMMessagesFrame - Standard conversation messages VisionImageRawFrame - Images for vision model processing LLMUpdateSettingsFrame - Runtime model configuration updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example ​ Basic Conversation with Function Calling Copy Ask AI import os from pipecat.services.openai.llm import OpenAILLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.llm_service import FunctionCallParams # Configure the service llm = OpenAILLMService( model = "gpt-4o" , api_key = os.getenv( "OPENAI_API_KEY" ), params = OpenAILLMService.InputParams( temperature = 0.7 , ) ) # Define function schema weather_function = FunctionSchema( name = "get_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City name" } }, required = [ "location" ] ) # Create tools and context tools = ToolsSchema( standard_tools = [weather_function]) context = OpenAILLMContext( messages = [{ "role" : "system" , "content" : "You are a helpful assistant. Keep responses concise." }], tools = tools ) # Register function handler async def get_weather_handler ( params : FunctionCallParams): location = params.arguments.get( "location" ) # Call weather API here... weather_data = { "temperature" : "75°F" , "conditions" : "sunny" } await params.result_callback(weather_data) llm.register_function( "get_weather" , get_weather_handler) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), # Handles user messages llm, # Processes with OpenAI tts, transport.output(), context_aggregator.assistant() # Captures responses ]) ​ Metrics The service provides: Time to First Byte (TTFB) - Latency from request to first response token Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and total usage Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes Streaming Responses : All responses are streamed for low latency Context Persistence : Use context aggregators to maintain conversation history Error Handling : Automatic retry logic for rate limits and transient errors Compatible Services : Works with OpenAI-compatible APIs by setting base_url Ollama OpenPipe On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Basic Conversation with Function Calling Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_openpipe_d6a8a5fa.txt b/llm_openpipe_d6a8a5fa.txt
new file mode 100644
index 0000000000000000000000000000000000000000..64b93f98c919ebe33ecdbd3b8fc0a6c3aa98238b
--- /dev/null
+++ b/llm_openpipe_d6a8a5fa.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/openpipe#overview
+Title: OpenPipe - Pipecat
+==================================================
+
+OpenPipe - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM OpenPipe Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenPipeLLMService extends the BaseOpenAILLMService to provide integration with OpenPipe, enabling request logging, model fine-tuning, and performance monitoring. It maintains compatibility with OpenAI’s API while adding OpenPipe’s logging and optimization capabilities. API Reference Complete API documentation and method details OpenPipe Docs Official OpenPipe API documentation and features ​ Installation To use OpenPipeLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openpipe]" You’ll need to set up both API keys as environment variables: OPENPIPE_API_KEY - Your OpenPipe API key OPENAI_API_KEY - Your OpenAI API key Get your OpenPipe API key from OpenPipe Dashboard . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.openpipe.llm import OpenPipeLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure OpenPipe service with comprehensive logging llm = OpenPipeLLMService( model = "gpt-4o" , api_key = os.getenv( "OPENAI_API_KEY" ), openpipe_api_key = os.getenv( "OPENPIPE_API_KEY" ), tags = { "environment" : "production" , "feature" : "conversational-ai" , "deployment" : "voice-assistant" , "version" : "v1.2" }, params = OpenPipeLLMService.InputParams( temperature = 0.7 , max_completion_tokens = 1000 ) ) # Define function for monitoring tool usage weather_function = FunctionSchema( name = "get_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" } }, required = [ "location" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context with system optimization context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful voice assistant. Keep responses concise and natural for speech synthesis. All conversations are logged for quality improvement.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function with logging awareness async def get_weather ( params ): location = params.arguments[ "location" ] # Function calls are automatically logged by OpenPipe await params.result_callback( f "Weather in { location } : 72°F and sunny" ) llm.register_function( "get_weather" , get_weather) # Use in pipeline - all requests automatically logged pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, # Automatic logging happens here tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI metrics plus OpenPipe-specific logging: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Detailed consumption tracking Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Privacy Aware : Configurable data retention and filtering policies Cost Optimization : Detailed analytics help optimize model usage and costs Fine-tuning Pipeline : Seamless transition from logging to custom model training OpenAI OpenRouter On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_openrouter_40465e58.txt b/llm_openrouter_40465e58.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0d400a1164219000edad556bfbfc8bf8c952b3fb
--- /dev/null
+++ b/llm_openrouter_40465e58.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/openrouter#frames
+Title: OpenRouter - Pipecat
+==================================================
+
+OpenRouter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM OpenRouter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenRouterLLMService provides access to OpenRouter’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details OpenRouter Docs Official OpenRouter API documentation and features Example Code Working example with function calling ​ Installation To use OpenRouterLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openrouter]" You’ll also need to set up your OpenRouter API key as an environment variable: OPENROUTER_API_KEY . Get your API key from OpenRouter . Free tier includes $1 of credits. ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing (select models) LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.openrouter.llm import OpenRouterLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure OpenRouter service llm = OpenRouterLLMService( api_key = os.getenv( "OPENROUTER_API_KEY" ), model = "openai/gpt-4o-2024-11-20" , # Easy model switching params = OpenRouterLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice conversations. Keep responses concise and avoid special characters for better speech synthesis.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) # Easy model switching for different use cases # llm.set_model_name("anthropic/claude-3.5-sonnet") # Switch to Claude # llm.set_model_name("meta-llama/llama-3.1-70b-instruct") # Switch to Llama ​ Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes Model Variety : Access 70+ models from OpenAI, Anthropic, Meta, Google, and more OpenAI Compatibility : Full compatibility with existing OpenAI code Easy Switching : Change models with a single parameter update Fallback Support : Built-in model fallbacks for high availability OpenPipe Perplexity On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_openrouter_d81d1c19.txt b/llm_openrouter_d81d1c19.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c1e48915ce3cddcb9e5a33d9a33081b5a1a031d1
--- /dev/null
+++ b/llm_openrouter_d81d1c19.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/openrouter#metrics
+Title: OpenRouter - Pipecat
+==================================================
+
+OpenRouter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM OpenRouter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenRouterLLMService provides access to OpenRouter’s language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details OpenRouter Docs Official OpenRouter API documentation and features Example Code Working example with function calling ​ Installation To use OpenRouterLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openrouter]" You’ll also need to set up your OpenRouter API key as an environment variable: OPENROUTER_API_KEY . Get your API key from OpenRouter . Free tier includes $1 of credits. ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing (select models) LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.openrouter.llm import OpenRouterLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure OpenRouter service llm = OpenRouterLLMService( api_key = os.getenv( "OPENROUTER_API_KEY" ), model = "openai/gpt-4o-2024-11-20" , # Easy model switching params = OpenRouterLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant optimized for voice conversations. Keep responses concise and avoid special characters for better speech synthesis.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) # Easy model switching for different use cases # llm.set_model_name("anthropic/claude-3.5-sonnet") # Switch to Claude # llm.set_model_name("meta-llama/llama-3.1-70b-instruct") # Switch to Llama ​ Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes Model Variety : Access 70+ models from OpenAI, Anthropic, Meta, Google, and more OpenAI Compatibility : Full compatibility with existing OpenAI code Easy Switching : Change models with a single parameter update Fallback Support : Built-in model fallbacks for high availability OpenPipe Perplexity On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_qwen_25b243fd.txt b/llm_qwen_25b243fd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d5331c86fd5773089d57db6484fdb704b46d4e01
--- /dev/null
+++ b/llm_qwen_25b243fd.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/qwen#installation
+Title: Qwen - Pipecat
+==================================================
+
+Qwen - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Qwen Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview QwenLLMService provides access to Alibaba Cloud’s Qwen language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management, with particularly strong capabilities for Chinese language processing. API Reference Complete API documentation and method details Qwen Docs Official Qwen API documentation and features Example Code Working example with function calling ​ Installation To use QwenLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[qwen]" You’ll also need to set up your DashScope API key as an environment variable: QWEN_API_KEY . Get your API key from Alibaba Cloud Model Studio . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.qwen.llm import QwenLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Qwen service llm = QwenLLMService( api_key = os.getenv( "QWEN_API_KEY" ), model = "qwen2.5-72b-instruct" , # High-quality open source model params = QwenLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and country, e.g. Beijing, China or San Francisco, USA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create bilingual context for Chinese/English support context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant in voice conversations. Keep responses concise for speech output. You can respond in Chinese when the user speaks Chinese, or English when they speak English. 你是一个语音对话助手。请保持简洁的回答以适合语音输出。 当用户用中文交流时用中文回答，用英文交流时用英文回答。""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] # Return response that works well in both languages await params.result_callback({ "conditions" : "sunny" , "temperature" : "22°C" , "location" : location }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, # Consider QwenTTSService for Chinese speech transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Long Context Support : Models support up to 1M token contexts for extensive conversations Multilingual Excellence : Superior performance in Chinese with strong English capabilities Code-Switching : Seamlessly handles mixed Chinese-English conversations Alibaba Cloud Integration : Native integration with Alibaba Cloud ecosystem Perplexity SambaNova On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_qwen_b6705651.txt b/llm_qwen_b6705651.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f962e8f4073911e97ca230efc9eb5eb54977b65e
--- /dev/null
+++ b/llm_qwen_b6705651.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/qwen#usage-example
+Title: Qwen - Pipecat
+==================================================
+
+Qwen - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Qwen Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview QwenLLMService provides access to Alibaba Cloud’s Qwen language models through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management, with particularly strong capabilities for Chinese language processing. API Reference Complete API documentation and method details Qwen Docs Official Qwen API documentation and features Example Code Working example with function calling ​ Installation To use QwenLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[qwen]" You’ll also need to set up your DashScope API key as an environment variable: QWEN_API_KEY . Get your API key from Alibaba Cloud Model Studio . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.qwen.llm import QwenLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Qwen service llm = QwenLLMService( api_key = os.getenv( "QWEN_API_KEY" ), model = "qwen2.5-72b-instruct" , # High-quality open source model params = QwenLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and country, e.g. Beijing, China or San Francisco, USA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create bilingual context for Chinese/English support context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant in voice conversations. Keep responses concise for speech output. You can respond in Chinese when the user speaks Chinese, or English when they speak English. 你是一个语音对话助手。请保持简洁的回答以适合语音输出。 当用户用中文交流时用中文回答，用英文交流时用英文回答。""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler async def fetch_weather ( params ): location = params.arguments[ "location" ] # Return response that works well in both languages await params.result_callback({ "conditions" : "sunny" , "temperature" : "22°C" , "location" : location }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, # Consider QwenTTSService for Chinese speech transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Long Context Support : Models support up to 1M token contexts for extensive conversations Multilingual Excellence : Superior performance in Chinese with strong English capabilities Code-Switching : Seamlessly handles mixed Chinese-English conversations Alibaba Cloud Integration : Native integration with Alibaba Cloud ecosystem Perplexity SambaNova On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_together_1f6743dc.txt b/llm_together_1f6743dc.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ad3082b5e598e4f6a8b3c5075c404f9c7d1f291f
--- /dev/null
+++ b/llm_together_1f6743dc.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/together
+Title: Together AI - Pipecat
+==================================================
+
+Together AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Together AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview TogetherLLMService provides access to Together AI’s language models, including Meta’s Llama 3.1 and 3.2 models, through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Together AI Docs Official Together AI API documentation and features Example Code Working example with function calling ​ Installation To use TogetherLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[together]" You’ll also need to set up your Together AI API key as an environment variable: TOGETHER_API_KEY . Get your API key from Together AI Console . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing (select models) LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.together.llm import TogetherLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Together AI service llm = TogetherLLMService( api_key = os.getenv( "TOGETHER_API_KEY" ), model = "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" , # Balanced performance params = TogetherLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant in a voice conversation. Keep responses concise and avoid special characters for better speech synthesis.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Open Source Models : Access to cutting-edge open-source models like Llama Vision Support : Select models support multimodal image and text understanding Competitive Pricing : Cost-effective alternative to proprietary model APIs Flexible Scaling : Choose model size based on performance vs cost requirements SambaNova AWS Polly On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/llm_together_c43396e2.txt b/llm_together_c43396e2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a81c7ecd1e9e6b6d015e6a6c9e509a2ca0d3125b
--- /dev/null
+++ b/llm_together_c43396e2.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/llm/together#installation
+Title: Together AI - Pipecat
+==================================================
+
+Together AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation LLM Together AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Anthropic AWS Bedrock Azure Cerebras DeepSeek Fireworks AI Google Gemini Google Vertex AI Grok Groq NVIDIA NIM Ollama OpenAI OpenPipe OpenRouter Perplexity Qwen SambaNova Together AI Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview TogetherLLMService provides access to Together AI’s language models, including Meta’s Llama 3.1 and 3.2 models, through an OpenAI-compatible interface. It inherits from OpenAILLMService and supports streaming responses, function calling, and context management. API Reference Complete API documentation and method details Together AI Docs Official Together AI API documentation and features Example Code Working example with function calling ​ Installation To use TogetherLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[together]" You’ll also need to set up your Together AI API key as an environment variable: TOGETHER_API_KEY . Get your API key from Together AI Console . ​ Frames ​ Input OpenAILLMContextFrame - Conversation context and history LLMMessagesFrame - Direct message list VisionImageRawFrame - Images for vision processing (select models) LLMUpdateSettingsFrame - Runtime parameter updates ​ Output LLMFullResponseStartFrame / LLMFullResponseEndFrame - Response boundaries LLMTextFrame - Streamed completion chunks FunctionCallInProgressFrame / FunctionCallResultFrame - Function call lifecycle ErrorFrame - API or processing errors ​ Function Calling Function Calling Guide Learn how to implement function calling with standardized schemas, register handlers, manage context properly, and control execution flow in your conversational AI applications. ​ Context Management Context Management Guide Learn how to manage conversation context, handle message history, and integrate context aggregators for consistent conversational experiences. ​ Usage Example Copy Ask AI import os from pipecat.services.together.llm import TogetherLLMService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema # Configure Together AI service llm = TogetherLLMService( api_key = os.getenv( "TOGETHER_API_KEY" ), model = "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" , # Balanced performance params = TogetherLLMService.InputParams( temperature = 0.7 , max_tokens = 1000 ) ) # Define function for tool calling weather_function = FunctionSchema( name = "get_current_weather" , description = "Get current weather information" , properties = { "location" : { "type" : "string" , "description" : "City and state, e.g. San Francisco, CA" }, "format" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ], "description" : "Temperature unit to use" } }, required = [ "location" , "format" ] ) tools = ToolsSchema( standard_tools = [weather_function]) # Create context optimized for voice context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : """You are a helpful assistant in a voice conversation. Keep responses concise and avoid special characters for better speech synthesis.""" } ], tools = tools ) # Create context aggregators context_aggregator = llm.create_context_aggregator(context) # Register function handler with feedback async def fetch_weather ( params ): location = params.arguments[ "location" ] await params.result_callback({ "conditions" : "sunny" , "temperature" : "75°F" }) llm.register_function( "get_current_weather" , fetch_weather) # Optional: Add function call feedback @llm.event_handler ( "on_function_calls_started" ) async def on_function_calls_started ( service , function_calls ): await tts.queue_frame(TTSSpeakFrame( "Let me check on that." )) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Metrics Inherits all OpenAI metrics capabilities: Time to First Byte (TTFB) - Response latency measurement Processing Duration - Total request processing time Token Usage - Prompt tokens, completion tokens, and totals Enable with: Copy Ask AI task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , enable_usage_metrics = True ) ) ​ Additional Notes OpenAI Compatibility : Full compatibility with OpenAI API features and parameters Open Source Models : Access to cutting-edge open-source models like Llama Vision Support : Select models support multimodal image and text understanding Competitive Pricing : Cost-effective alternative to proprietary model APIs Flexible Scaling : Choose model size based on performance vs cost requirements SambaNova AWS Polly On this page Overview Installation Frames Input Output Function Calling Context Management Usage Example Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/mcp_mcp_102f7e08.txt b/mcp_mcp_102f7e08.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5019b28d0b4034fc71c60fa291e063bf778832e4
--- /dev/null
+++ b/mcp_mcp_102f7e08.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/mcp/mcp#overview
+Title: MCPClient - Pipecat
+==================================================
+
+MCPClient - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation MCP MCPClient Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP MCPClient Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview MCP is an open standard for enabling AI agents to interact with external data and tools. MCPClient provides a way to access and call tools via MCP. For example, instead of writing bespoke function call implementations for an external API, you may use an MCP server that provides a bridge to the API. Be aware there may be security implications. See MCP documenation for more details. ​ Installation To use MCPClient , install the required dependencies: Copy Ask AI pip install "pipecat-ai[mcp]" You may also need to set environment variables as required by the specific MCP server to which you are connecting. ​ Configuration ​ Constructor Parameters You can connect to your MCP server via Stdio or SSE transport. See here for more documentation on MCP transports. ​ server_params str | StdioServerParameters required You can provide either: URL: “ https://your.mcp.server/sse ” StdioServerParameters, which are defined as: Copy Ask AI StdioServerParameters( command = "python" , # Executable args = [ "example_server.py" ], # Optional command line arguments env = None , # Optional environment variables ) ​ Input Parameters See more information regarding server params here . ​ Usage Example ​ MCP Stdio Transport Implementation Copy Ask AI # Import MCPClient and StdioServerParameters ... from mcp import StdioServerParameters from pipecat.services.mcp_service import MCPClient ... # Initialize an LLM llm = ... # Initialize and configure MCPClient with server parameters mcp = MCPClient( server_params = StdioServerParameters( command = shutil.which( "npx" ), args = [ "-y" , "@name/mcp-server-name@latest" ], env = { "ENV_API_KEY" : "<env_api_key>" }, ) ) # Create tools schema from the MCP server and register them with llm tools = await mcp.register_tools(llm) # Create context with system message and tools # Tip: Let the LLM know it has access to tools from an MCP server by including it in the system prompt. context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : "You are a helpful assistant in a voice conversation. You have access to MCP tools. Keep responses concise." } ], tools = tools ) ​ MCP SSE Transport Implementation Copy Ask AI # Import MCPClient ... from pipecat.services.mcp_service import MCPClient ... # Initialize an LLM llm = ... # Initialize and configure MCPClient with MCP SSE server url mcp = MCPClient( server_params = "https://your.mcp.server/sse" ) # Create tools schema from the MCP server and register them with llm tools = await mcp.register_tools(llm) # Create context with system message and tools # Tip: Let the LLM know it has access to tools from an MCP server by including it in the system prompt. context = OpenAILLMContext( messages = [ { "role" : "system" , "content" : "You are a helpful assistant in a voice conversation. You have access to MCP tools. Keep responses concise." } ], tools = tools ) ​ Methods ​ register_tools async method Converts MCP tools to Pipecat-friendly function definitions and registers the functions with the llm. Copy Ask AI async def register_tools ( self , llm ) -> ToolsSchema: ​ Additional documentation See MCP’s docs for MCP related updates. OpenTelemetry Observer Pattern On this page Overview Installation Configuration Constructor Parameters Input Parameters Usage Example MCP Stdio Transport Implementation MCP SSE Transport Implementation Methods Additional documentation Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/memory_mem0_15d49f95.txt b/memory_mem0_15d49f95.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cd4f64850251aec15112c41e243b33753fe57ecb
--- /dev/null
+++ b/memory_mem0_15d49f95.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/memory/mem0#param-position
+Title: Mem0 - Pipecat
+==================================================
+
+Mem0 - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Memory Mem0 Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Mem0 Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Mem0MemoryService provides long-term memory capabilities for conversational agents by integrating with Mem0’s API. It automatically stores conversation history and retrieves relevant past context based on the current conversation, enhancing LLM responses with persistent memory across sessions. ​ Installation To use the Mem0 memory service, install the required dependencies: Copy Ask AI pip install "pipecat-ai[mem0]" You’ll also need to set up your Mem0 API key as an environment variable: MEM0_API_KEY . You can obtain a Mem0 API key by signing up at mem0.ai . ​ Mem0MemoryService ​ Constructor Parameters ​ api_key str required Mem0 API key for accessing the service ​ user_id str Unique identifier for the end user to associate with memories ​ agent_id str Identifier for the agent using the memory service ​ run_id str Identifier for the specific conversation session ​ params InputParams Configuration parameters for memory retrieval (see below) ​ local_config dict Configuration for using local LLMs and embedders instead of Mem0’s cloud API (see Local Configuration section) At least one of user_id , agent_id , or run_id must be provided to organize memories. ​ Input Parameters The params object accepts the following configuration settings: ​ search_limit int default: "10" Maximum number of relevant memories to retrieve per query ​ search_threshold float default: "0.1" Relevance threshold for memory retrieval (0.0 to 1.0) ​ api_version str default: "v2" Mem0 API version to use ​ system_prompt str Prefix text to add before retrieved memories ​ add_as_system_message bool default: "True" Whether to add memories as a system message (True) or user message (False) ​ position int default: "1" Position in the context where memories should be inserted ​ Input Frames The service processes the following input frames: ​ OpenAILLMContextFrame Frame Contains OpenAI-specific conversation context ​ LLMMessagesFrame Frame Contains conversation messages in standard format ​ Output Frames The service may produce the following output frames: ​ LLMMessagesFrame Frame Enhanced messages with relevant memories included ​ OpenAILLMContextFrame Frame Enhanced OpenAI context with memories included ​ ErrorFrame Frame Contains error information if memory operations fail ​ Memory Operations The service performs two main operations automatically: ​ Message Storage All conversation messages are stored in Mem0 for future reference. The service: Captures full message history from context frames Associates messages with the specified user/agent/run IDs Stores metadata to enable efficient retrieval ​ Memory Retrieval When a new user message is detected, the service: Uses the message as a search query Retrieves relevant past memories from Mem0 Formats memories with the configured system prompt Adds the formatted memories to the conversation context Passes the enhanced context downstream in the pipeline ​ Pipeline Positioning The memory service should be positioned after the user context aggregator but before the LLM service: Copy Ask AI context_aggregator.user() → memory_service → llm This ensures that: The user’s latest message is included in the context The memory service can enhance the context before the LLM processes it The LLM receives the enhanced context with relevant memories ​ Usage Examples ​ Basic Integration Copy Ask AI from pipecat.services.mem0.memory import Mem0MemoryService from pipecat.pipeline.pipeline import Pipeline # Create the memory service memory = Mem0MemoryService( api_key = os.getenv( "MEM0_API_KEY" ), user_id = "user123" , # Unique user identifier ) # Position the memory service between context aggregator and LLM pipeline = Pipeline([ transport.input(), context_aggregator.user(), memory, # <-- Memory service enhances context here llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Using Local Configuration The local_config parameter allows you to use your own LLM and embedding providers instead of Mem0’s cloud API. This is useful for self-hosted deployments or when you want more control over the memory processing. Copy Ask AI local_config = { "llm" : { "provider" : str , # LLM provider name (e.g., "anthropic", "openai") "config" : { # Provider-specific configuration "model" : str , # Model name "api_key" : str , # API key for the provider # Other provider-specific parameters } }, "embedder" : { "provider" : str , # Embedding provider name (e.g., "openai") "config" : { # Provider-specific configuration "model" : str , # Model name # Other provider-specific parameters } } } # Initialize Mem0 memory service with local configuration memory = Mem0MemoryService( local_config = local_config, # Use local LLM for memory processing user_id = "user123" , # Unique identifier for the user ) When using local_config do not provide the api_key parameter. ​ Frame Flow ​ Error Handling The service includes basic error handling to ensure conversation flow continues even when memory operations fail: Exceptions during memory storage and retrieval are caught and logged If an error occurs during frame processing, an ErrorFrame is emitted with error details The original frame is still passed downstream to prevent the pipeline from stalling Connection and authentication errors from the Mem0 API will be logged but won’t interrupt the conversation While the service attempts to handle errors gracefully, memory operations that fail may result in missing context in conversations. Monitor your application logs for memory-related errors. Tavus Moondream On this page Overview Installation Mem0MemoryService Constructor Parameters Input Parameters Input Frames Output Frames Memory Operations Message Storage Memory Retrieval Pipeline Positioning Usage Examples Basic Integration Using Local Configuration Frame Flow Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/memory_mem0_96c8ada1.txt b/memory_mem0_96c8ada1.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d7321bac041e1b782222152335ccfca14154b44e
--- /dev/null
+++ b/memory_mem0_96c8ada1.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/memory/mem0#param-add-as-system-message
+Title: Mem0 - Pipecat
+==================================================
+
+Mem0 - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Memory Mem0 Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Mem0 Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Mem0MemoryService provides long-term memory capabilities for conversational agents by integrating with Mem0’s API. It automatically stores conversation history and retrieves relevant past context based on the current conversation, enhancing LLM responses with persistent memory across sessions. ​ Installation To use the Mem0 memory service, install the required dependencies: Copy Ask AI pip install "pipecat-ai[mem0]" You’ll also need to set up your Mem0 API key as an environment variable: MEM0_API_KEY . You can obtain a Mem0 API key by signing up at mem0.ai . ​ Mem0MemoryService ​ Constructor Parameters ​ api_key str required Mem0 API key for accessing the service ​ user_id str Unique identifier for the end user to associate with memories ​ agent_id str Identifier for the agent using the memory service ​ run_id str Identifier for the specific conversation session ​ params InputParams Configuration parameters for memory retrieval (see below) ​ local_config dict Configuration for using local LLMs and embedders instead of Mem0’s cloud API (see Local Configuration section) At least one of user_id , agent_id , or run_id must be provided to organize memories. ​ Input Parameters The params object accepts the following configuration settings: ​ search_limit int default: "10" Maximum number of relevant memories to retrieve per query ​ search_threshold float default: "0.1" Relevance threshold for memory retrieval (0.0 to 1.0) ​ api_version str default: "v2" Mem0 API version to use ​ system_prompt str Prefix text to add before retrieved memories ​ add_as_system_message bool default: "True" Whether to add memories as a system message (True) or user message (False) ​ position int default: "1" Position in the context where memories should be inserted ​ Input Frames The service processes the following input frames: ​ OpenAILLMContextFrame Frame Contains OpenAI-specific conversation context ​ LLMMessagesFrame Frame Contains conversation messages in standard format ​ Output Frames The service may produce the following output frames: ​ LLMMessagesFrame Frame Enhanced messages with relevant memories included ​ OpenAILLMContextFrame Frame Enhanced OpenAI context with memories included ​ ErrorFrame Frame Contains error information if memory operations fail ​ Memory Operations The service performs two main operations automatically: ​ Message Storage All conversation messages are stored in Mem0 for future reference. The service: Captures full message history from context frames Associates messages with the specified user/agent/run IDs Stores metadata to enable efficient retrieval ​ Memory Retrieval When a new user message is detected, the service: Uses the message as a search query Retrieves relevant past memories from Mem0 Formats memories with the configured system prompt Adds the formatted memories to the conversation context Passes the enhanced context downstream in the pipeline ​ Pipeline Positioning The memory service should be positioned after the user context aggregator but before the LLM service: Copy Ask AI context_aggregator.user() → memory_service → llm This ensures that: The user’s latest message is included in the context The memory service can enhance the context before the LLM processes it The LLM receives the enhanced context with relevant memories ​ Usage Examples ​ Basic Integration Copy Ask AI from pipecat.services.mem0.memory import Mem0MemoryService from pipecat.pipeline.pipeline import Pipeline # Create the memory service memory = Mem0MemoryService( api_key = os.getenv( "MEM0_API_KEY" ), user_id = "user123" , # Unique user identifier ) # Position the memory service between context aggregator and LLM pipeline = Pipeline([ transport.input(), context_aggregator.user(), memory, # <-- Memory service enhances context here llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Using Local Configuration The local_config parameter allows you to use your own LLM and embedding providers instead of Mem0’s cloud API. This is useful for self-hosted deployments or when you want more control over the memory processing. Copy Ask AI local_config = { "llm" : { "provider" : str , # LLM provider name (e.g., "anthropic", "openai") "config" : { # Provider-specific configuration "model" : str , # Model name "api_key" : str , # API key for the provider # Other provider-specific parameters } }, "embedder" : { "provider" : str , # Embedding provider name (e.g., "openai") "config" : { # Provider-specific configuration "model" : str , # Model name # Other provider-specific parameters } } } # Initialize Mem0 memory service with local configuration memory = Mem0MemoryService( local_config = local_config, # Use local LLM for memory processing user_id = "user123" , # Unique identifier for the user ) When using local_config do not provide the api_key parameter. ​ Frame Flow ​ Error Handling The service includes basic error handling to ensure conversation flow continues even when memory operations fail: Exceptions during memory storage and retrieval are caught and logged If an error occurs during frame processing, an ErrorFrame is emitted with error details The original frame is still passed downstream to prevent the pipeline from stalling Connection and authentication errors from the Mem0 API will be logged but won’t interrupt the conversation While the service attempts to handle errors gracefully, memory operations that fail may result in missing context in conversations. Monitor your application logs for memory-related errors. Tavus Moondream On this page Overview Installation Mem0MemoryService Constructor Parameters Input Parameters Input Frames Output Frames Memory Operations Message Storage Memory Retrieval Pipeline Positioning Usage Examples Basic Integration Using Local Configuration Frame Flow Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/memory_mem0_e434fe6f.txt b/memory_mem0_e434fe6f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f901b5614c6bd88e7ee325b309a3f30dd4823424
--- /dev/null
+++ b/memory_mem0_e434fe6f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/memory/mem0#message-storage
+Title: Mem0 - Pipecat
+==================================================
+
+Mem0 - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Memory Mem0 Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Mem0 Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Mem0MemoryService provides long-term memory capabilities for conversational agents by integrating with Mem0’s API. It automatically stores conversation history and retrieves relevant past context based on the current conversation, enhancing LLM responses with persistent memory across sessions. ​ Installation To use the Mem0 memory service, install the required dependencies: Copy Ask AI pip install "pipecat-ai[mem0]" You’ll also need to set up your Mem0 API key as an environment variable: MEM0_API_KEY . You can obtain a Mem0 API key by signing up at mem0.ai . ​ Mem0MemoryService ​ Constructor Parameters ​ api_key str required Mem0 API key for accessing the service ​ user_id str Unique identifier for the end user to associate with memories ​ agent_id str Identifier for the agent using the memory service ​ run_id str Identifier for the specific conversation session ​ params InputParams Configuration parameters for memory retrieval (see below) ​ local_config dict Configuration for using local LLMs and embedders instead of Mem0’s cloud API (see Local Configuration section) At least one of user_id , agent_id , or run_id must be provided to organize memories. ​ Input Parameters The params object accepts the following configuration settings: ​ search_limit int default: "10" Maximum number of relevant memories to retrieve per query ​ search_threshold float default: "0.1" Relevance threshold for memory retrieval (0.0 to 1.0) ​ api_version str default: "v2" Mem0 API version to use ​ system_prompt str Prefix text to add before retrieved memories ​ add_as_system_message bool default: "True" Whether to add memories as a system message (True) or user message (False) ​ position int default: "1" Position in the context where memories should be inserted ​ Input Frames The service processes the following input frames: ​ OpenAILLMContextFrame Frame Contains OpenAI-specific conversation context ​ LLMMessagesFrame Frame Contains conversation messages in standard format ​ Output Frames The service may produce the following output frames: ​ LLMMessagesFrame Frame Enhanced messages with relevant memories included ​ OpenAILLMContextFrame Frame Enhanced OpenAI context with memories included ​ ErrorFrame Frame Contains error information if memory operations fail ​ Memory Operations The service performs two main operations automatically: ​ Message Storage All conversation messages are stored in Mem0 for future reference. The service: Captures full message history from context frames Associates messages with the specified user/agent/run IDs Stores metadata to enable efficient retrieval ​ Memory Retrieval When a new user message is detected, the service: Uses the message as a search query Retrieves relevant past memories from Mem0 Formats memories with the configured system prompt Adds the formatted memories to the conversation context Passes the enhanced context downstream in the pipeline ​ Pipeline Positioning The memory service should be positioned after the user context aggregator but before the LLM service: Copy Ask AI context_aggregator.user() → memory_service → llm This ensures that: The user’s latest message is included in the context The memory service can enhance the context before the LLM processes it The LLM receives the enhanced context with relevant memories ​ Usage Examples ​ Basic Integration Copy Ask AI from pipecat.services.mem0.memory import Mem0MemoryService from pipecat.pipeline.pipeline import Pipeline # Create the memory service memory = Mem0MemoryService( api_key = os.getenv( "MEM0_API_KEY" ), user_id = "user123" , # Unique user identifier ) # Position the memory service between context aggregator and LLM pipeline = Pipeline([ transport.input(), context_aggregator.user(), memory, # <-- Memory service enhances context here llm, tts, transport.output(), context_aggregator.assistant() ]) ​ Using Local Configuration The local_config parameter allows you to use your own LLM and embedding providers instead of Mem0’s cloud API. This is useful for self-hosted deployments or when you want more control over the memory processing. Copy Ask AI local_config = { "llm" : { "provider" : str , # LLM provider name (e.g., "anthropic", "openai") "config" : { # Provider-specific configuration "model" : str , # Model name "api_key" : str , # API key for the provider # Other provider-specific parameters } }, "embedder" : { "provider" : str , # Embedding provider name (e.g., "openai") "config" : { # Provider-specific configuration "model" : str , # Model name # Other provider-specific parameters } } } # Initialize Mem0 memory service with local configuration memory = Mem0MemoryService( local_config = local_config, # Use local LLM for memory processing user_id = "user123" , # Unique identifier for the user ) When using local_config do not provide the api_key parameter. ​ Frame Flow ​ Error Handling The service includes basic error handling to ensure conversation flow continues even when memory operations fail: Exceptions during memory storage and retrieval are caught and logged If an error occurs during frame processing, an ErrorFrame is emitted with error details The original frame is still passed downstream to prevent the pipeline from stalling Connection and authentication errors from the Mem0 API will be logged but won’t interrupt the conversation While the service attempts to handle errors gracefully, memory operations that fail may result in missing context in conversations. Monitor your application logs for memory-related errors. Tavus Moondream On this page Overview Installation Mem0MemoryService Constructor Parameters Input Parameters Input Frames Output Frames Memory Operations Message Storage Memory Retrieval Pipeline Positioning Usage Examples Basic Integration Using Local Configuration Frame Flow Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/observers_debug-observer_18254afe.txt b/observers_debug-observer_18254afe.txt
new file mode 100644
index 0000000000000000000000000000000000000000..21b6901be1cecc90ef4fc47401ee7f1c7feca582
--- /dev/null
+++ b/observers_debug-observer_18254afe.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/observers/debug-observer#features
+Title: Debug Log Observer - Pipecat
+==================================================
+
+Debug Log Observer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Observers Debug Log Observer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Observer Pattern Debug Observer LLM Observer Transcription Observer Turn Tracking Observer Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The DebugLogObserver provides detailed logging of frame activity in your Pipecat pipeline, with full visibility into frame content and flexible filtering options. ​ Features Log all frame types and their content Filter by specific frame types Filter by source or destination components Automatic formatting of frame fields Special handling for complex data structures ​ Usage ​ Log All Frames Log all frames passing through the pipeline: Copy Ask AI from pipecat.observers.loggers.debug_log_observer import DebugLogObserver task = PipelineTask( pipeline, params = PipelineParams( observers = [DebugLogObserver()], ), ) ​ Filter by Frame Types Log only specific frame types: Copy Ask AI from pipecat.frames.frames import TranscriptionFrame, InterimTranscriptionFrame from pipecat.observers.loggers.debug_log_observer import DebugLogObserver task = PipelineTask( pipeline, params = PipelineParams( observers = [ DebugLogObserver( frame_types = ( TranscriptionFrame, InterimTranscriptionFrame )) ], ), ) ​ Advanced Source/Destination Filtering Filter frames based on their type and source/destination: Copy Ask AI from pipecat.frames.frames import StartInterruptionFrame, UserStartedSpeakingFrame, LLMTextFrame from pipecat.observers.loggers.debug_log_observer import DebugLogObserver, FrameEndpoint from pipecat.transports.base_output_transport import BaseOutputTransport from pipecat.services.stt_service import STTService task = PipelineTask( pipeline, params = PipelineParams( observers = [ DebugLogObserver( frame_types = { # Only log StartInterruptionFrame when source is BaseOutputTransport StartInterruptionFrame: (BaseOutputTransport, FrameEndpoint. SOURCE ), # Only log UserStartedSpeakingFrame when destination is STTService UserStartedSpeakingFrame: (STTService, FrameEndpoint. DESTINATION ), # Log LLMTextFrame regardless of source or destination LLMTextFrame: None }) ], ), ) ​ Log Output Format The observer logs each frame with its complete details: Copy Ask AI [Source] → [Destination]: [FrameType] [field1: value1, field2: value2, ...] at [timestamp]s For example: Copy Ask AI OpenAILLMService#0 → DailyTransport#0: LLMTextFrame text: 'Hello, how can I help you today?' at 1.24s ​ Configuration Options Parameter Type Description frame_types Tuple[Type[Frame], ...] or Dict[Type[Frame], Optional[Tuple[Type, FrameEndpoint]]] Frame types to log, with optional source/destination filtering exclude_fields Set[str] Field names to exclude from logging (defaults to binary fields) ​ FrameEndpoint Enum The FrameEndpoint enum is used for source/destination filtering: FrameEndpoint.SOURCE : Filter by source component FrameEndpoint.DESTINATION : Filter by destination component Observer Pattern LLM Observer On this page Features Usage Log All Frames Filter by Frame Types Advanced Source/Destination Filtering Log Output Format Configuration Options FrameEndpoint Enum Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/observers_llm-observer_b3b8fbaf.txt b/observers_llm-observer_b3b8fbaf.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d494576307cad39da7f448b94dad254387615dbc
--- /dev/null
+++ b/observers_llm-observer_b3b8fbaf.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/observers/llm-observer
+Title: LLM Log Observer - Pipecat
+==================================================
+
+LLM Log Observer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Observers LLM Log Observer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Observer Pattern Debug Observer LLM Observer Transcription Observer Turn Tracking Observer Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The LLMLogObserver provides detailed logging of Large Language Model (LLM) activity within your Pipecat pipeline. It tracks the entire lifecycle of LLM interactions, from initial prompts to final responses. ​ Frame Types Monitored The observer tracks the following frame types (only from/to LLM service): LLMFullResponseStartFrame : When the LLM begins generating a response LLMFullResponseEndFrame : When the LLM completes its response LLMTextFrame : Individual text chunks generated by the LLM FunctionCallInProgressFrame : Function/tool calls made by the LLM LLMMessagesFrame : Input messages sent to the LLM OpenAILLMContextFrame : Context information for OpenAI LLM calls FunctionCallResultFrame : Results returned from function calls ​ Usage Copy Ask AI from pipecat.observers.loggers.llm_log_observer import LLMLogObserver task = PipelineTask( pipeline, params = PipelineParams( observers = [LLMLogObserver()], ), ) ​ Log Output Format The observer uses emojis and consistent formatting for easy log reading: 🧠 [Source] → LLM START/END RESPONSE 🧠 [Source] → LLM GENERATING: [text] 🧠 [Source] → LLM FUNCTION CALL: [details] 🧠 → [Destination] LLM MESSAGES FRAME: [messages] 🧠 → [Destination] LLM CONTEXT FRAME: [context] All log entries include timestamps for precise timing analysis. Debug Observer Transcription Observer On this page Frame Types Monitored Usage Log Output Format Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/observers_transcription-observer_c54a3cce.txt b/observers_transcription-observer_c54a3cce.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1b992aaa37e06d2ab40ac10028fe6e8908c74bde
--- /dev/null
+++ b/observers_transcription-observer_c54a3cce.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/observers/transcription-observer#frame-types-monitored
+Title: Transcription Log Observer - Pipecat
+==================================================
+
+Transcription Log Observer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Observers Transcription Log Observer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Observer Pattern Debug Observer LLM Observer Transcription Observer Turn Tracking Observer Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The TranscriptionLogObserver logs all speech-to-text transcription activity in your Pipecat pipeline, providing visibility into both final and interim transcription results. ​ Frame Types Monitored The observer tracks the following frame types (only from STT service): TranscriptionFrame : Final transcription results InterimTranscriptionFrame : In-progress transcription results ​ Usage Copy Ask AI from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver task = PipelineTask( pipeline, params = PipelineParams( observers = [TranscriptionLogObserver()], ), ) ​ Log Output Format The observer uses consistent formatting with emoji indicators: 💬 [Source] → TRANSCRIPTION: [text] from [user_id] 💬 [Source] → INTERIM TRANSCRIPTION: [text] from [user_id] All log entries include timestamps for precise timing analysis. LLM Observer Turn Tracking Observer On this page Frame Types Monitored Usage Log Output Format Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/pipecat-transport-gemini-live-websocket_indexhtml_e36d3a53.txt b/pipecat-transport-gemini-live-websocket_indexhtml_e36d3a53.txt
new file mode 100644
index 0000000000000000000000000000000000000000..bae2f17df553094f72690d8cfe169281708b5c6b
--- /dev/null
+++ b/pipecat-transport-gemini-live-websocket_indexhtml_e36d3a53.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/android/pipecat-transport-gemini-live-websocket/index.html#join-our-community
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/pipeline_parallel-pipeline_83ae90f8.txt b/pipeline_parallel-pipeline_83ae90f8.txt
new file mode 100644
index 0000000000000000000000000000000000000000..94b23c16d8d42b7ebc3c67ea3c6a6cea91b83cbd
--- /dev/null
+++ b/pipeline_parallel-pipeline_83ae90f8.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/pipeline/parallel-pipeline#redundant-services-with-failover
+Title: ParallelPipeline - Pipecat
+==================================================
+
+ParallelPipeline - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Pipeline ParallelPipeline Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview ParallelPipeline allows you to create multiple independent processing branches that run simultaneously, sharing input and coordinating output. It’s particularly useful for multi-agent systems, parallel stream processing, and creating redundant service paths. Each branch receives the same downstream frames, processes them independently, and the results are merged back into a single stream. System frames (like StartFrame and EndFrame ) are synchronized across all branches. ​ Constructor Parameters ​ *args List[List[FrameProcessor]] required Multiple lists of processors, where each list defines a parallel branch. All branches execute simultaneously when frames flow through the pipeline. ​ Usage Examples ​ Multi-Agent Conversation Create a conversation with two AI agents that can interact with the user independently: Copy Ask AI pipeline = Pipeline([ transport.input(), ParallelPipeline( # Agent 1: Customer service representative [ stt_1, context_aggregator.user_a(), llm_agent_1, tts_agent_1, ], # Agent 2: Technical specialist [ stt_2, context_aggregator.user_b(), llm_agent_2, tts_agent_2, ] ), transport.output(), ]) ​ Redundant Services with Failover Set up redundant services with automatic failover: Copy Ask AI pipeline = Pipeline([ transport.input(), stt, ParallelPipeline( # Primary LLM service [ gate_primary, primary_llm, error_detector, ], # Backup LLM service (used only if primary fails) [ gate_backup, backup_llm, fallback_processor, ] ), tts, transport.output(), ]) ​ Cross-Branch Communication Using Producer/Consumer processors to share data between branches: Copy Ask AI # Create producer/consumer pair for cross-branch communication frame_producer = ProducerProcessor( filter = is_important_frame) frame_consumer = ConsumerProcessor( producer = frame_producer) pipeline = Pipeline([ transport.input(), ParallelPipeline( # Branch that generates important frames [ stt, llm, tts, frame_producer, # Share frames with other branch ], # Branch that consumes those frames [ frame_consumer, # Receive frames from other branch llm, # Speech to Speech LLM (audio in) ] ), transport.output(), ]) ​ How It Works ParallelPipeline adds special source and sink processors to each branch System frames (like StartFrame and EndFrame ) are sent to all branches Other frames flow downstream to all branch sources Results from each branch are collected at the sinks The pipeline ensures EndFrame s are only passed through after all branches complete Pipeline Heartbeats On this page Overview Constructor Parameters Usage Examples Multi-Agent Conversation Redundant Services with Failover Cross-Branch Communication How It Works Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/pipeline_pipeline-params_edb566f6.txt b/pipeline_pipeline-params_edb566f6.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1528e86503ba1fdeb52e8296b8609aa7d31b2887
--- /dev/null
+++ b/pipeline_pipeline-params_edb566f6.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/pipeline/pipeline-params#param-send-initial-empty-metrics
+Title: PipelineParams - Pipecat
+==================================================
+
+PipelineParams - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Pipeline PipelineParams Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The PipelineParams class provides a structured way to configure various aspects of pipeline execution. These parameters control behaviors like audio settings, metrics collection, heartbeat monitoring, and interruption handling. ​ Basic Usage Copy Ask AI from pipecat.pipeline.task import PipelineParams, PipelineTask # Create with default parameters params = PipelineParams() # Or customize specific parameters params = PipelineParams( allow_interruptions = True , audio_in_sample_rate = 16000 , enable_metrics = True ) # Pass to PipelineTask pipeline = Pipeline([ ... ]) task = PipelineTask(pipeline, params = params) ​ Available Parameters ​ allow_interruptions bool default: "False" Whether to allow pipeline interruptions. When enabled, a user’s speech will immediately interrupt the bot’s response. ​ audio_in_sample_rate int default: "16000" Input audio sample rate in Hz. Setting the audio_in_sample_rate as a PipelineParam sets the input sample rate for all corresponding services in the pipeline. ​ audio_out_sample_rate int default: "24000" Output audio sample rate in Hz. Setting the audio_out_sample_rate as a PipelineParam sets the output sample rate for all corresponding services in the pipeline. ​ enable_heartbeats bool default: "False" Whether to enable heartbeat monitoring to detect pipeline stalls. See Heartbeats for details. ​ heartbeats_period_secs float default: "1.0" Period between heartbeats in seconds (when heartbeats are enabled). ​ enable_metrics bool default: "False" Whether to enable metrics collection for pipeline performance. ​ enable_usage_metrics bool default: "False" Whether to enable usage metrics tracking. ​ report_only_initial_ttfb bool default: "False" Whether to report only initial time to first byte metric. ​ send_initial_empty_metrics bool default: "True" Whether to send initial empty metrics frame at pipeline start. ​ start_metadata Dict[str, Any] default: "{}" Additional metadata to include in the StartFrame. ​ Common Configurations ​ Audio Processing Configuration You can set the audio input and output sample rates in the PipelineParams to set the sample rate for all input and output services in the pipeline. This acts as a convenience to avoid setting the sample rate for each service individually. Note, if services are set individually, they will supersede the values set in PipelineParams . Copy Ask AI params = PipelineParams( audio_in_sample_rate = 8000 , # Lower quality input audio audio_out_sample_rate = 8000 # High quality output audio ) ​ Performance Monitoring Configuration Pipeline heartbeats provide a way to monitor the health of your pipeline by sending periodic heartbeat frames through the system. When enabled, the pipeline will send heartbeat frames every second and monitor their progress through the pipeline. Copy Ask AI params = PipelineParams( enable_heartbeats = True , heartbeats_period_secs = 2.0 , # Send heartbeats every 2 seconds enable_metrics = True ) ​ How Parameters Are Used The parameters you set in PipelineParams are passed to various components of the pipeline: StartFrame : Many parameters are included in the StartFrame that initializes the pipeline Metrics Collection : Metrics settings configure what performance data is gathered Heartbeat Monitoring : Controls the pipeline’s health monitoring system Audio Processing : Sample rates affect how audio is processed throughout the pipeline ​ Complete Example Copy Ask AI from pipecat.frames.frames import TTSSpeakFrame from pipecat.observers.file_observer import FileObserver from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.runner import PipelineRunner # Create comprehensive parameters params = PipelineParams( allow_interruptions = True , audio_in_sample_rate = 8000 , audio_out_sample_rate = 8000 , enable_heartbeats = True , enable_metrics = True , enable_usage_metrics = True , heartbeats_period_secs = 1.0 , report_only_initial_ttfb = False , start_metadata = { "conversation_id" : "conv-123" , "session_data" : { "user_id" : "user-456" , "start_time" : "2023-10-25T14:30:00Z" } } ) # Create pipeline and task pipeline = Pipeline([ ... ]) task = PipelineTask( pipeline, params = params, observers = [FileObserver( "pipeline_logs.jsonl" )] ) # Run the pipeline runner = PipelineRunner() await runner.run(task) ​ Additional Information Parameters are immutable once the pipeline starts The start_metadata dictionary can contain any serializable data For metrics collection to work properly, enable_metrics must be set to True Pipecat Flows PipelineTask On this page Overview Basic Usage Available Parameters Common Configurations Audio Processing Configuration Performance Monitoring Configuration How Parameters Are Used Complete Example Additional Information Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/pipeline_pipeline-params_f56c0978.txt b/pipeline_pipeline-params_f56c0978.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c5e969e5fcf83dae6f905abe3d0373b0f9c0ad32
--- /dev/null
+++ b/pipeline_pipeline-params_f56c0978.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/pipeline/pipeline-params#param-audio-in-sample-rate
+Title: PipelineParams - Pipecat
+==================================================
+
+PipelineParams - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Pipeline PipelineParams Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The PipelineParams class provides a structured way to configure various aspects of pipeline execution. These parameters control behaviors like audio settings, metrics collection, heartbeat monitoring, and interruption handling. ​ Basic Usage Copy Ask AI from pipecat.pipeline.task import PipelineParams, PipelineTask # Create with default parameters params = PipelineParams() # Or customize specific parameters params = PipelineParams( allow_interruptions = True , audio_in_sample_rate = 16000 , enable_metrics = True ) # Pass to PipelineTask pipeline = Pipeline([ ... ]) task = PipelineTask(pipeline, params = params) ​ Available Parameters ​ allow_interruptions bool default: "False" Whether to allow pipeline interruptions. When enabled, a user’s speech will immediately interrupt the bot’s response. ​ audio_in_sample_rate int default: "16000" Input audio sample rate in Hz. Setting the audio_in_sample_rate as a PipelineParam sets the input sample rate for all corresponding services in the pipeline. ​ audio_out_sample_rate int default: "24000" Output audio sample rate in Hz. Setting the audio_out_sample_rate as a PipelineParam sets the output sample rate for all corresponding services in the pipeline. ​ enable_heartbeats bool default: "False" Whether to enable heartbeat monitoring to detect pipeline stalls. See Heartbeats for details. ​ heartbeats_period_secs float default: "1.0" Period between heartbeats in seconds (when heartbeats are enabled). ​ enable_metrics bool default: "False" Whether to enable metrics collection for pipeline performance. ​ enable_usage_metrics bool default: "False" Whether to enable usage metrics tracking. ​ report_only_initial_ttfb bool default: "False" Whether to report only initial time to first byte metric. ​ send_initial_empty_metrics bool default: "True" Whether to send initial empty metrics frame at pipeline start. ​ start_metadata Dict[str, Any] default: "{}" Additional metadata to include in the StartFrame. ​ Common Configurations ​ Audio Processing Configuration You can set the audio input and output sample rates in the PipelineParams to set the sample rate for all input and output services in the pipeline. This acts as a convenience to avoid setting the sample rate for each service individually. Note, if services are set individually, they will supersede the values set in PipelineParams . Copy Ask AI params = PipelineParams( audio_in_sample_rate = 8000 , # Lower quality input audio audio_out_sample_rate = 8000 # High quality output audio ) ​ Performance Monitoring Configuration Pipeline heartbeats provide a way to monitor the health of your pipeline by sending periodic heartbeat frames through the system. When enabled, the pipeline will send heartbeat frames every second and monitor their progress through the pipeline. Copy Ask AI params = PipelineParams( enable_heartbeats = True , heartbeats_period_secs = 2.0 , # Send heartbeats every 2 seconds enable_metrics = True ) ​ How Parameters Are Used The parameters you set in PipelineParams are passed to various components of the pipeline: StartFrame : Many parameters are included in the StartFrame that initializes the pipeline Metrics Collection : Metrics settings configure what performance data is gathered Heartbeat Monitoring : Controls the pipeline’s health monitoring system Audio Processing : Sample rates affect how audio is processed throughout the pipeline ​ Complete Example Copy Ask AI from pipecat.frames.frames import TTSSpeakFrame from pipecat.observers.file_observer import FileObserver from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.pipeline.runner import PipelineRunner # Create comprehensive parameters params = PipelineParams( allow_interruptions = True , audio_in_sample_rate = 8000 , audio_out_sample_rate = 8000 , enable_heartbeats = True , enable_metrics = True , enable_usage_metrics = True , heartbeats_period_secs = 1.0 , report_only_initial_ttfb = False , start_metadata = { "conversation_id" : "conv-123" , "session_data" : { "user_id" : "user-456" , "start_time" : "2023-10-25T14:30:00Z" } } ) # Create pipeline and task pipeline = Pipeline([ ... ]) task = PipelineTask( pipeline, params = params, observers = [FileObserver( "pipeline_logs.jsonl" )] ) # Run the pipeline runner = PipelineRunner() await runner.run(task) ​ Additional Information Parameters are immutable once the pipeline starts The start_metadata dictionary can contain any serializable data For metrics collection to work properly, enable_metrics must be set to True Pipecat Flows PipelineTask On this page Overview Basic Usage Available Parameters Common Configurations Audio Processing Configuration Performance Monitoring Configuration How Parameters Are Used Complete Example Additional Information Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/react_components_6c0dfdab.txt b/react_components_6c0dfdab.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4a758f01a887fc2479531ec15ad71031e049586b
--- /dev/null
+++ b/react_components_6c0dfdab.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/react/components
+Title: Components - Pipecat
+==================================================
+
+Components - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation API Reference Components Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference Components Hooks React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides several components for handling audio, video, and visualization in your application. ​ PipecatClientProvider The root component for providing Pipecat client context to your application. Copy Ask AI < PipecatClientProvider client = { pcClient } > { /* Child components */ } </ PipecatClientProvider > Props ​ client PipecatClient required A singleton instance of PipecatClient ​ PipecatClientAudio Creates a new <audio> element that mounts the bot’s audio track. Copy Ask AI < PipecatClientAudio /> Props No props required ​ PipecatClientVideo Creates a new <video> element that renders either the bot or local participant’s video track. Copy Ask AI < PipecatClientVideo participant = "local" fit = "cover" mirror onResize = { ({ aspectRatio , height , width }) => { console . log ( "Video dimensions changed:" , { aspectRatio , height , width }); } } /> Props ​ participant ('local' | 'bot') required Defines which participant’s video track is rendered ​ fit ('contain' | 'cover') Defines whether the video should be fully contained or cover the box. Default: ‘contain’ ​ mirror boolean Forces the video to be mirrored, if set ​ onResize(dimensions: object) function Triggered whenever the video’s rendered width or height changes ​ PipecatClientCamToggle A headless component to read and set the local participant’s camera state. Copy Ask AI < PipecatClientCamToggle onCamEnabledChanged = { ( enabled ) => console . log ( "Camera enabled:" , enabled ) } disabled = { false } > { ({ disabled , isCamEnabled , onClick }) => ( < button disabled = { disabled } onClick = { onClick } > { isCamEnabled ? "Disable Camera" : "Enable Camera" } </ button > ) } </ PipecatClientCamToggle > Props ​ onCamEnabledChanged(enabled: boolean) function Triggered whenever the local participant’s camera state changes ​ disabled boolean If true, the component will not allow toggling the camera state. Default: false ​ children function A render prop that provides state and handlers to the children ​ PipecatClientMicToggle A headless component to read and set the local participant’s microphone state. Copy Ask AI < PipecatClientMicToggle onMicEnabledChanged = { ( enabled ) => console . log ( "Microphone enabled:" , enabled ) } disabled = { false } > { ({ disabled , isMicEnabled , onClick }) => ( < button disabled = { disabled } onClick = { onClick } > { isMicEnabled ? "Disable Microphone" : "Enable Microphone" } </ button > ) } </ PipecatClientMicToggle > Props ​ onMicEnabledChanged(enabled: boolean) function Triggered whenever the local participant’s microphone state changes ​ disabled boolean If true, the component will not allow toggling the microphone state. Default: false ​ children function A render prop that provides state and handlers to the children ​ VoiceVisualizer Renders a visual representation of audio input levels on a <canvas> element. Copy Ask AI < VoiceVisualizer participantType = "local" backgroundColor = "white" barColor = "black" barGap = { 1 } barWidth = { 4 } barMaxHeight = { 24 } /> Props ​ participantType string required The participant type to visualize audio for ​ backgroundColor string The background color of the canvas. Default: ‘transparent’ ​ barColor string The color of the audio level bars. Default: ‘black’ ​ barCount number The number of bars to display. Default: 5 ​ barGap number The gap between bars in pixels. Default: 12 ​ barWidth number The width of each bar in pixels. Default: 30 ​ barMaxHeight number The maximum height at full volume of each bar in pixels. Default: 120 SDK Introduction Hooks On this page PipecatClientProvider PipecatClientAudio PipecatClientVideo PipecatClientCamToggle PipecatClientMicToggle VoiceVisualizer Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/react_components_a7d16765.txt b/react_components_a7d16765.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e84165d4bba0ee8f2b140cd2458df28ad28ba159
--- /dev/null
+++ b/react_components_a7d16765.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/react/components#param-participant
+Title: Components - Pipecat
+==================================================
+
+Components - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation API Reference Components Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference Components Hooks React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides several components for handling audio, video, and visualization in your application. ​ PipecatClientProvider The root component for providing Pipecat client context to your application. Copy Ask AI < PipecatClientProvider client = { pcClient } > { /* Child components */ } </ PipecatClientProvider > Props ​ client PipecatClient required A singleton instance of PipecatClient ​ PipecatClientAudio Creates a new <audio> element that mounts the bot’s audio track. Copy Ask AI < PipecatClientAudio /> Props No props required ​ PipecatClientVideo Creates a new <video> element that renders either the bot or local participant’s video track. Copy Ask AI < PipecatClientVideo participant = "local" fit = "cover" mirror onResize = { ({ aspectRatio , height , width }) => { console . log ( "Video dimensions changed:" , { aspectRatio , height , width }); } } /> Props ​ participant ('local' | 'bot') required Defines which participant’s video track is rendered ​ fit ('contain' | 'cover') Defines whether the video should be fully contained or cover the box. Default: ‘contain’ ​ mirror boolean Forces the video to be mirrored, if set ​ onResize(dimensions: object) function Triggered whenever the video’s rendered width or height changes ​ PipecatClientCamToggle A headless component to read and set the local participant’s camera state. Copy Ask AI < PipecatClientCamToggle onCamEnabledChanged = { ( enabled ) => console . log ( "Camera enabled:" , enabled ) } disabled = { false } > { ({ disabled , isCamEnabled , onClick }) => ( < button disabled = { disabled } onClick = { onClick } > { isCamEnabled ? "Disable Camera" : "Enable Camera" } </ button > ) } </ PipecatClientCamToggle > Props ​ onCamEnabledChanged(enabled: boolean) function Triggered whenever the local participant’s camera state changes ​ disabled boolean If true, the component will not allow toggling the camera state. Default: false ​ children function A render prop that provides state and handlers to the children ​ PipecatClientMicToggle A headless component to read and set the local participant’s microphone state. Copy Ask AI < PipecatClientMicToggle onMicEnabledChanged = { ( enabled ) => console . log ( "Microphone enabled:" , enabled ) } disabled = { false } > { ({ disabled , isMicEnabled , onClick }) => ( < button disabled = { disabled } onClick = { onClick } > { isMicEnabled ? "Disable Microphone" : "Enable Microphone" } </ button > ) } </ PipecatClientMicToggle > Props ​ onMicEnabledChanged(enabled: boolean) function Triggered whenever the local participant’s microphone state changes ​ disabled boolean If true, the component will not allow toggling the microphone state. Default: false ​ children function A render prop that provides state and handlers to the children ​ VoiceVisualizer Renders a visual representation of audio input levels on a <canvas> element. Copy Ask AI < VoiceVisualizer participantType = "local" backgroundColor = "white" barColor = "black" barGap = { 1 } barWidth = { 4 } barMaxHeight = { 24 } /> Props ​ participantType string required The participant type to visualize audio for ​ backgroundColor string The background color of the canvas. Default: ‘transparent’ ​ barColor string The color of the audio level bars. Default: ‘black’ ​ barCount number The number of bars to display. Default: 5 ​ barGap number The gap between bars in pixels. Default: 12 ​ barWidth number The width of each bar in pixels. Default: 30 ​ barMaxHeight number The maximum height at full volume of each bar in pixels. Default: 120 SDK Introduction Hooks On this page PipecatClientProvider PipecatClientAudio PipecatClientVideo PipecatClientCamToggle PipecatClientMicToggle VoiceVisualizer Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/react_hooks_4699064c.txt b/react_hooks_4699064c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7af03d461260ae8797f96950d3bb214653a79661
--- /dev/null
+++ b/react_hooks_4699064c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/react/hooks#param-event
+Title: Hooks - Pipecat
+==================================================
+
+Hooks - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation API Reference Hooks Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference Components Hooks React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides hooks for accessing client functionality, managing media devices, and handling events. ​ usePipecatClient Provides access to the PipecatClient instance originally passed to PipecatClientProvider . Copy Ask AI import { usePipecatClient } from "@pipecat-ai/client-react" ; function MyComponent () { const pcClient = usePipecatClient (); await pcClient . connect ({ endpoint: 'https://your-pipecat-api-url/connect' , requestData: { // Any custom data your /connect endpoint requires } }); } ​ useRTVIClientEvent Allows subscribing to RTVI client events. It is advised to wrap handlers with useCallback . Copy Ask AI import { useCallback } from "react" ; import { RTVIEvent , TransportState } from "@pipecat-ai/client-js" ; import { useRTVIClientEvent } from "@pipecat-ai/client-react" ; function EventListener () { useRTVIClientEvent ( RTVIEvent . TransportStateChanged , useCallback (( transportState : TransportState ) => { console . log ( "Transport state changed to" , transportState ); }, []) ); } Arguments ​ event RTVIEvent required ​ handler function required ​ usePipecatClientMediaDevices Manage and list available media devices. Copy Ask AI import { usePipecatClientMediaDevices } from "@pipecat-ai/client-react" ; function DeviceSelector () { const { availableCams , availableMics , selectedCam , selectedMic , updateCam , updateMic , } = usePipecatClientMediaDevices (); return ( <> < select name = "cam" onChange = { ( ev ) => updateCam ( ev . target . value ) } value = { selectedCam ?. deviceId } > { availableCams . map (( cam ) => ( < option key = { cam . deviceId } value = { cam . deviceId } > { cam . label } </ option > )) } </ select > < select name = "mic" onChange = { ( ev ) => updateMic ( ev . target . value ) } value = { selectedMic ?. deviceId } > { availableMics . map (( mic ) => ( < option key = { mic . deviceId } value = { mic . deviceId } > { mic . label } </ option > )) } </ select > </> ); } ​ usePipecatClientMediaTrack Access audio and video tracks. Copy Ask AI import { usePipecatClientMediaTrack } from "@pipecat-ai/client-react" ; function MyTracks () { const localAudioTrack = usePipecatClientMediaTrack ( "audio" , "local" ); const botAudioTrack = usePipecatClientMediaTrack ( "audio" , "bot" ); } Arguments ​ trackType 'audio' | 'video' required ​ participantType 'bot' | 'local' required ​ usePipecatClientTransportState Returns the current transport state. Copy Ask AI import { usePipecatClientTransportState } from "@pipecat-ai/client-react" ; function ConnectionStatus () { const transportState = usePipecatClientTransportState (); } ​ usePipecatClientCamControl Controls the local participant’s camera state. Copy Ask AI import { usePipecatClientCamControl } from "@pipecat-ai/client-react" ; function CamToggle () { const { enableCam , isCamEnabled } = usePipecatClientCamControl (); return ( < button onClick = { () => enableCam ( ! isCamEnabled ) } > { isCamEnabled ? "Disable Camera" : "Enable Camera" } </ button > ); } ​ usePipecatClientMicControl Controls the local participant’s microphone state. Copy Ask AI import { usePipecatClientMicControl } from "@pipecat-ai/client-react" ; function MicToggle () { const { enableMic , isMicEnabled } = usePipecatClientMicControl (); return ( < button onClick = { () => enableMic ( ! isMicEnabled ) } > { isMicEnabled ? "Disable Microphone" : "Enable Microphone" } </ button > ); } Components SDK Introduction On this page usePipecatClient useRTVIClientEvent usePipecatClientMediaDevices usePipecatClientMediaTrack usePipecatClientTransportState usePipecatClientCamControl usePipecatClientMicControl Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/react_hooks_ac63e425.txt b/react_hooks_ac63e425.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e750c76ededeadd21ddcc253d2d0dd599a85b225
--- /dev/null
+++ b/react_hooks_ac63e425.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/react/hooks#usepipecatclient
+Title: Hooks - Pipecat
+==================================================
+
+Hooks - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation API Reference Hooks Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference Components Hooks React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat React SDK provides hooks for accessing client functionality, managing media devices, and handling events. ​ usePipecatClient Provides access to the PipecatClient instance originally passed to PipecatClientProvider . Copy Ask AI import { usePipecatClient } from "@pipecat-ai/client-react" ; function MyComponent () { const pcClient = usePipecatClient (); await pcClient . connect ({ endpoint: 'https://your-pipecat-api-url/connect' , requestData: { // Any custom data your /connect endpoint requires } }); } ​ useRTVIClientEvent Allows subscribing to RTVI client events. It is advised to wrap handlers with useCallback . Copy Ask AI import { useCallback } from "react" ; import { RTVIEvent , TransportState } from "@pipecat-ai/client-js" ; import { useRTVIClientEvent } from "@pipecat-ai/client-react" ; function EventListener () { useRTVIClientEvent ( RTVIEvent . TransportStateChanged , useCallback (( transportState : TransportState ) => { console . log ( "Transport state changed to" , transportState ); }, []) ); } Arguments ​ event RTVIEvent required ​ handler function required ​ usePipecatClientMediaDevices Manage and list available media devices. Copy Ask AI import { usePipecatClientMediaDevices } from "@pipecat-ai/client-react" ; function DeviceSelector () { const { availableCams , availableMics , selectedCam , selectedMic , updateCam , updateMic , } = usePipecatClientMediaDevices (); return ( <> < select name = "cam" onChange = { ( ev ) => updateCam ( ev . target . value ) } value = { selectedCam ?. deviceId } > { availableCams . map (( cam ) => ( < option key = { cam . deviceId } value = { cam . deviceId } > { cam . label } </ option > )) } </ select > < select name = "mic" onChange = { ( ev ) => updateMic ( ev . target . value ) } value = { selectedMic ?. deviceId } > { availableMics . map (( mic ) => ( < option key = { mic . deviceId } value = { mic . deviceId } > { mic . label } </ option > )) } </ select > </> ); } ​ usePipecatClientMediaTrack Access audio and video tracks. Copy Ask AI import { usePipecatClientMediaTrack } from "@pipecat-ai/client-react" ; function MyTracks () { const localAudioTrack = usePipecatClientMediaTrack ( "audio" , "local" ); const botAudioTrack = usePipecatClientMediaTrack ( "audio" , "bot" ); } Arguments ​ trackType 'audio' | 'video' required ​ participantType 'bot' | 'local' required ​ usePipecatClientTransportState Returns the current transport state. Copy Ask AI import { usePipecatClientTransportState } from "@pipecat-ai/client-react" ; function ConnectionStatus () { const transportState = usePipecatClientTransportState (); } ​ usePipecatClientCamControl Controls the local participant’s camera state. Copy Ask AI import { usePipecatClientCamControl } from "@pipecat-ai/client-react" ; function CamToggle () { const { enableCam , isCamEnabled } = usePipecatClientCamControl (); return ( < button onClick = { () => enableCam ( ! isCamEnabled ) } > { isCamEnabled ? "Disable Camera" : "Enable Camera" } </ button > ); } ​ usePipecatClientMicControl Controls the local participant’s microphone state. Copy Ask AI import { usePipecatClientMicControl } from "@pipecat-ai/client-react" ; function MicToggle () { const { enableMic , isMicEnabled } = usePipecatClientMicControl (); return ( < button onClick = { () => enableMic ( ! isMicEnabled ) } > { isMicEnabled ? "Disable Microphone" : "Enable Microphone" } </ button > ); } Components SDK Introduction On this page usePipecatClient useRTVIClientEvent usePipecatClientMediaDevices usePipecatClientMediaTrack usePipecatClientTransportState usePipecatClientCamControl usePipecatClientMicControl Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/rtvi_introduction_403ea4ab.txt b/rtvi_introduction_403ea4ab.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b3c3652bcfaee1fd84161a2003efd42d472e9923
--- /dev/null
+++ b/rtvi_introduction_403ea4ab.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/rtvi/introduction#protocol-flow
+Title: RTVI (Real-Time Voice Interaction) - Pipecat
+==================================================
+
+RTVI (Real-Time Voice Interaction) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation RTVI RTVI (Real-Time Voice Interaction) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Overview RTVIProcessor RTVI Observer Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Pipecat’s RTVI (Real-Time Voice Interaction) protocol provides a standardized communication layer between clients and servers for building real-time voice and multimodal applications. It handles the synchronization of user and bot interactions, transcriptions, LLM processing, and text-to-speech delivery. Speaking States Track when users and bots start/stop speaking for natural turn-taking Transcription Handle real-time transcriptions from both users and bots LLM Processing Manage LLM responses and function calls with proper client notifications TTS Management Control text-to-speech state and audio delivery ​ Architecture RTVI operates with two primary components: RTVIProcessor - A frame processor residing in the pipeline that serves as the entry point for sending and receiving messages to/from the client. RTVIObserver - An observer that monitors pipeline events and translates them into client-compatible messages, handling: Speaking state changes Transcription updates LLM responses TTS events Performance metrics ​ Basic Example Here’s how to set up RTVI in your Pipecat application: Copy Ask AI from pipecat.processors.frameworks.rtvi import RTVIConfig, RTVIObserver, RTVIProcessor # Create the RTVI processor rtvi = RTVIProcessor( config = RTVIConfig( config = [])) # Include the RTVIProcessor in your pipeline pipeline = Pipeline( [ transport.input(), rtvi, stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ] ) # Add the RTVIObserver to your pipeline task task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , ), observers = [RTVIObserver(rtvi)], ) # Handle client connection @rtvi.event_handler ( "on_client_ready" ) async def on_client_ready ( rtvi ): # Signal bot is ready to receive messages await rtvi.set_bot_ready() # Initialize the conversation await task.queue_frames([context_aggregator.user().get_context_frame()]) # Handle participant disconnection @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): await task.cancel() # Run the pipeline runner = PipelineRunner() await runner.run(task) ​ Protocol Flow Client connects and sends a client-ready message Server responds with bot-ready and initial configuration Client and server exchange real-time events: Speaking state changes ( user/bot-started/stopped-speaking ) Transcriptions ( user/bot-transcription ) LLM processing ( bot-llm-text , llm-function-call ) TTS events ( bot-tts-text , bot-tts-audio ) ​ Key Components RTVIProcessor Configure and manage RTVI services, actions, and client communication RTVIObserver Translate internal pipeline events to standardized client messages ​ Client Integration RTVI is implemented in Pipecat client SDKs, providing a high-level API to interact with the protocol. Visit the Pipecat Client SDKs documentation: Client SDKs Learn how to implement RTVI on the client-side with our JavaScript, React, and mobile SDKs Interruption Strategies RTVIProcessor On this page Architecture Basic Example Protocol Flow Key Components Client Integration Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/rtvi_rtvi-processor_43aacaea.txt b/rtvi_rtvi-processor_43aacaea.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cddf54502f7bead145224e969af800a1f2def025
--- /dev/null
+++ b/rtvi_rtvi-processor_43aacaea.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/frameworks/rtvi/rtvi-processor#services
+Title: RTVIProcessor - Pipecat
+==================================================
+
+RTVIProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation RTVI RTVIProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Overview RTVIProcessor RTVI Observer Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The RTVIProcessor manages bidirectional communication between clients and your Pipecat application. It processes client messages, handles service configuration, executes actions, and coordinates function calls. ​ Initialization Add the RTVIProcessor to your pipeline: Copy Ask AI from pipecat.processors.rtvi import RTVIProcessor, RTVIConfig, RTVIServiceConfig # Create the RTVIProcessor rtvi = RTVIProcessor( config = RTVIConfig( config = [])) # Add to pipeline pipeline = Pipeline([ transport.input(), rtvi, stt, # ... other processors ... transport.output() ]) ​ Readiness Protocol ​ Client Ready State Clients indicate readiness by sending a client-ready message, triggering the on_client_ready event in the processor: Copy Ask AI @rtvi.event_handler ( "on_client_ready" ) async def on_client_ready ( rtvi ): # Handle client ready state await rtvi.set_bot_ready() # Initialize conversation await task.queue_frames([ ... ]) ​ Bot Ready State The server must mark the bot as ready before it can process client messages: Copy Ask AI await rtvi.set_bot_ready() When marked ready, the bot sends a response containing: RTVI protocol version Current service configuration Available actions ​ Services Services represent configurable components of your application that clients can interact with. ​ Registering Services Copy Ask AI # 1. Define option handler async def handle_voice_option ( processor , service , option ): voice_id = option.value # Apply configuration change logger.info( f "Voice ID updated to: { voice_id } " ) # 2. Create RTVIService voice_service = RTVIService( name = "voice" , options = [ RTVIServiceOption( name = "voice_id" , type = "string" , handler = handle_voice_option ) ] ) # 3. Register with processor rtvi.register_service(voice_service) ​ Option Types Services support multiple data types for configuration: Copy Ask AI RTVIServiceOption( name = "temperature" , type = "number" , # number, string, bool, array, object handler = handle_temperature ) Option handlers receive: The processor instance The service name The option configuration with new value ​ Actions Actions are server-side functions that clients can trigger with arguments. ​ Registering Actions Copy Ask AI # 1. Define handler function async def handle_print_message ( processor , service , arguments ): message = arguments.get( "message" , "Default message" ) logger.info( f "Print action triggered with message: { message } " ) return True # 2. Create and register RTVIAction print_action = RTVIAction( service = "conversation" , action = "print_message" , arguments = [ RTVIActionArgument( name = "message" , type = "string" ) ], result = "bool" , handler = handle_print_message ) rtvi.register_action(print_action) ​ Action Arguments Actions can accept typed arguments from clients: Copy Ask AI search_action = RTVIAction( service = "knowledge" , action = "search" , arguments = [ RTVIActionArgument( name = "query" , type = "string" ), RTVIActionArgument( name = "limit" , type = "number" ) ], result = "array" , handler = handle_search ) ​ Function Calls Handle LLM function calls with client interaction: Copy Ask AI await processor.handle_function_call( function_name = function_name, tool_call_id = tool_call_id, arguments = arguments, ) await processor.handle_function_call(params) The function call process: LLM requests a function call Processor notifies client with llm-function-call message Client executes function and returns result Result is passed back to LLM via FunctionCallResultFrame Conversation continues ​ Error Handling Send error messages to clients: Copy Ask AI # General error await processor.send_error( "Invalid configuration" ) # Request-specific error await processor._send_error_response(request_id, "Invalid action arguments" ) Error categories: Configuration errors Action execution errors Function call errors Protocol errors Fatal and non-fatal errors ​ Bot Control Manage bot state and handle interruptions: Copy Ask AI # Set bot as ready await processor.set_bot_ready() # Handle interruptions await processor.interrupt_bot() ​ Custom Messaging Send custom messages from server to client: Copy Ask AI from pipecat.processors.frameworks.rtvi import RTVIServerMessageFrame # Send a custom message frame = RTVIServerMessageFrame( data = { "type" : "custom-event" , "payload" : { "key" : "value" } } ) await rtvi.push_frame(frame) The client receives these messages through the onServerMessage callback or serverMessage event. Overview RTVI Observer On this page Initialization Readiness Protocol Client Ready State Bot Ready State Services Registering Services Option Types Actions Registering Actions Action Arguments Function Calls Error Handling Bot Control Custom Messaging Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_aws_f69d5de6.txt b/s2s_aws_f69d5de6.txt
new file mode 100644
index 0000000000000000000000000000000000000000..037251959227978faf6c8aca213e3be813c893ae
--- /dev/null
+++ b/s2s_aws_f69d5de6.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/aws#param-access-key-id
+Title: AWS Nova Sonic - Pipecat
+==================================================
+
+AWS Nova Sonic - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech AWS Nova Sonic Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The AWSNovaSonicLLMService enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions ​ Installation To use AWSNovaSonicLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws-nova-sonic]" We recommend setting up your AWS credentials as environment variables, as you’ll need them to initialize AWSNovaSonicLLMService : AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID AWS_REGION ​ Basic Usage Here’s a simple example of setting up a conversational AI bot with AWS Nova Sonic: Copy Ask AI from pipecat.services.aws_nova_sonic.aws import AWSNovaSonicLLMService llm = AWSNovaSonicLLMService( secret_access_key = os.getenv( "AWS_SECRET_ACCESS_KEY" ), access_key_id = os.getenv( "AWS_ACCESS_KEY_ID" ), region = os.getenv( "AWS_REGION" ) voice_id = "tiffany" , # Voices: matthew, tiffany, amy ) ​ Configuration ​ Constructor Parameters ​ secret_access_key str required Your AWS secret access key ​ access_key_id str required Your AWS access key ID ​ region str required Specify the AWS region for the service (e.g., "us-east-1" ). Note that the service may not be available in all AWS regions: check the AWS Bedrock User Guide’s support table . ​ model str default: "amazon.nova-sonic-v1:0" AWS Nova Sonic model to use. Note that "amazon.nova-sonic-v1:0" is the only supported model as of 2025-05-08. ​ voice_id str default: "matthew" Voice for text-to-speech (options: "matthew" , "tiffany" , "amy" ) ​ params Params Configuration for model parameters ​ system_instruction str High-level instructions that guide the model’s behavior. Note that more commonly these instructions will be included as part of the context provided to kick off the conversation. ​ tools ToolsSchema List of function definitions for tool/function calling. Note that more commonly tools will be included as part of the context provided to kick off the conversation. ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Model Parameters The Params object configures the behavior of the AWS Nova Sonic model. It is strongly recommended to stick with default values (most easily by omitting params when constructing AWSNovaSonicLLMService ) unless you have a good understanding of the parameters and their impact. Deviating from the defaults may lead to unexpected behavior. ​ temperature float default: "0.7" Controls randomness in responses. Range: 0.0 to 2.0 ​ max_tokens int default: "1024" Maximum number of tokens to generate ​ top_p float default: "0.9" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 ​ input_sample_rate int default: "16000" Sample rate for input audio ​ output_sample_rate int default: "24000" Sample rate for output audio ​ input_sample_size int default: "16" Bit depth for input audio ​ input_channel_count int default: "1" Number of channels for input audio ​ output_sample_size int default: "16" Bit depth for output audio ​ output_channel_count int default: "1" Number of channels for output audio ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data for speech input ​ OpenAILLMContextFrame Frame Contains conversation context ​ BotStoppedSpeakingFrame Frame Signals the bot has stopped speaking ​ Output Frames ​ TTSAudioRawFrame Frame Generated speech audio ​ LLMFullResponseStartFrame Frame Signals the start of a response from the LLM ​ LLMFullResponseEndFrame Frame Signals the end of a response from the LLM ​ TTSStartedFrame Frame Signals start of speech synthesis (coincides with the start of the LLM response, as this is a speech-to-speech model) ​ TTSStoppedFrame Frame Signals end of speech synthesis (coincides with the end of the LLM response, as this is a speech-to-speech model) ​ LLMTextFrame Frame Generated text responses from the LLM ​ TTSTextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions. Only output if send_transcription_frames is True . ​ Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Next Steps ​ Examples Foundational Example Basic implementation showing core features Persistent Content Example Implementation showing saving and loading conversation history XTTS Gemini Multimodal Live On this page Installation Basic Usage Configuration Constructor Parameters Model Parameters Frame Types Input Frames Output Frames Function Calling Next Steps Examples Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_gemini_1c86de40.txt b/s2s_gemini_1c86de40.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c185c58e9a35c14f1ac0666412592cf5fc31abc5
--- /dev/null
+++ b/s2s_gemini_1c86de40.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-language
+Title: Gemini Multimodal Live - Pipecat
+==================================================
+
+Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . ​ Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . ​ Basic Usage Here’s a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) ​ Configuration ​ Constructor Parameters ​ api_key str required Your Google API key ​ base_url str API endpoint URL ​ model str Gemini model to use (upgraded to new v1beta model) ​ voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) ​ system_instruction str High-level instructions that guide the model’s behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ start_video_paused bool default: "False" Whether to start with video input paused ​ tools Union[List[dict], ToolsSchema] Tools/functions available to the model ​ inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set ​ Input Parameters ​ frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 ​ max_tokens int default: "4096" Maximum number of tokens to generate ​ modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). ​ presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 ​ temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 ​ language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. ​ media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens ​ vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) ​ top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 ​ top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 ​ context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) ​ Methods ​ set_audio_input_paused(paused: bool) method Pause or unpause audio input processing ​ set_video_input_paused(paused: bool) method Pause or unpause video input processing ​ set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) ​ set_language(language: Language) method Change the language for generation ​ set_context(context: OpenAILLMContext) method Set the conversation context explicitly ​ create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data for speech input ​ InputImageRawFrame Frame Raw image data for visual input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Adds messages to the conversation ​ LLMUpdateSettingsFrame Frame Updates LLM settings ​ LLMSetToolsFrame Frame Sets available tools for the LLM ​ Output Frames ​ TTSAudioRawFrame Frame Generated speech audio ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ LLMTextFrame Frame Generated text responses from the LLM ​ TTSTextFrame Frame Text used for speech synthesis ​ TranscriptionFrame Frame Speech transcriptions from user audio ​ LLMFullResponseStartFrame Frame Signals the start of a complete LLM response ​ LLMFullResponseEndFrame Frame Signals the end of a complete LLM response ​ Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. ​ Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) ​ Next Steps ​ Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. ​ Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_gemini_4f0f6232.txt b/s2s_gemini_4f0f6232.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9c782224a19d0e3609488023453f69c5e7d4b1d5
--- /dev/null
+++ b/s2s_gemini_4f0f6232.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-user-started-speaking-frame
+Title: Gemini Multimodal Live - Pipecat
+==================================================
+
+Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . ​ Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . ​ Basic Usage Here’s a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) ​ Configuration ​ Constructor Parameters ​ api_key str required Your Google API key ​ base_url str API endpoint URL ​ model str Gemini model to use (upgraded to new v1beta model) ​ voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) ​ system_instruction str High-level instructions that guide the model’s behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ start_video_paused bool default: "False" Whether to start with video input paused ​ tools Union[List[dict], ToolsSchema] Tools/functions available to the model ​ inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set ​ Input Parameters ​ frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 ​ max_tokens int default: "4096" Maximum number of tokens to generate ​ modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). ​ presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 ​ temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 ​ language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. ​ media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens ​ vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) ​ top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 ​ top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 ​ context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) ​ Methods ​ set_audio_input_paused(paused: bool) method Pause or unpause audio input processing ​ set_video_input_paused(paused: bool) method Pause or unpause video input processing ​ set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) ​ set_language(language: Language) method Change the language for generation ​ set_context(context: OpenAILLMContext) method Set the conversation context explicitly ​ create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data for speech input ​ InputImageRawFrame Frame Raw image data for visual input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Adds messages to the conversation ​ LLMUpdateSettingsFrame Frame Updates LLM settings ​ LLMSetToolsFrame Frame Sets available tools for the LLM ​ Output Frames ​ TTSAudioRawFrame Frame Generated speech audio ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ LLMTextFrame Frame Generated text responses from the LLM ​ TTSTextFrame Frame Text used for speech synthesis ​ TranscriptionFrame Frame Speech transcriptions from user audio ​ LLMFullResponseStartFrame Frame Signals the start of a complete LLM response ​ LLMFullResponseEndFrame Frame Signals the end of a complete LLM response ​ Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. ​ Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) ​ Next Steps ​ Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. ​ Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_gemini_5de4bf4d.txt b/s2s_gemini_5de4bf4d.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0c6f2d587d6ac08f8c0c4141f0839236842c7c53
--- /dev/null
+++ b/s2s_gemini_5de4bf4d.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-media-resolution
+Title: Gemini Multimodal Live - Pipecat
+==================================================
+
+Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . ​ Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . ​ Basic Usage Here’s a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) ​ Configuration ​ Constructor Parameters ​ api_key str required Your Google API key ​ base_url str API endpoint URL ​ model str Gemini model to use (upgraded to new v1beta model) ​ voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) ​ system_instruction str High-level instructions that guide the model’s behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ start_video_paused bool default: "False" Whether to start with video input paused ​ tools Union[List[dict], ToolsSchema] Tools/functions available to the model ​ inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set ​ Input Parameters ​ frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 ​ max_tokens int default: "4096" Maximum number of tokens to generate ​ modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). ​ presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 ​ temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 ​ language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. ​ media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens ​ vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) ​ top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 ​ top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 ​ context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) ​ Methods ​ set_audio_input_paused(paused: bool) method Pause or unpause audio input processing ​ set_video_input_paused(paused: bool) method Pause or unpause video input processing ​ set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) ​ set_language(language: Language) method Change the language for generation ​ set_context(context: OpenAILLMContext) method Set the conversation context explicitly ​ create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data for speech input ​ InputImageRawFrame Frame Raw image data for visual input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Adds messages to the conversation ​ LLMUpdateSettingsFrame Frame Updates LLM settings ​ LLMSetToolsFrame Frame Sets available tools for the LLM ​ Output Frames ​ TTSAudioRawFrame Frame Generated speech audio ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ LLMTextFrame Frame Generated text responses from the LLM ​ TTSTextFrame Frame Text used for speech synthesis ​ TranscriptionFrame Frame Speech transcriptions from user audio ​ LLMFullResponseStartFrame Frame Signals the start of a complete LLM response ​ LLMFullResponseEndFrame Frame Signals the end of a complete LLM response ​ Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. ​ Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) ​ Next Steps ​ Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. ​ Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_gemini_62331d92.txt b/s2s_gemini_62331d92.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3d9a12837e65218eb187741313d05dbe81a70161
--- /dev/null
+++ b/s2s_gemini_62331d92.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/gemini#param-system-instruction
+Title: Gemini Multimodal Live - Pipecat
+==================================================
+
+Gemini Multimodal Live - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech Gemini Multimodal Live Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The GeminiMultimodalLiveLLMService enables natural, real-time conversations with Google’s Gemini model. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences. It provides: Real-time Interaction Stream audio and video in real-time with low latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with multiple voice options Voice Activity Detection Automatic detection of speech start/stop for natural conversations Context Management Intelligent handling of conversation history and system instructions Want to start building? Check out our Gemini Multimodal Live Guide . ​ Installation To use GeminiMultimodalLiveLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need to set up your Google API key as an environment variable: GOOGLE_API_KEY . ​ Basic Usage Here’s a simple example of setting up a conversational AI bot with Gemini Multimodal Live: Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams, GeminiMultimodalModalities ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Aoede" , # Voices: Aoede, Charon, Fenrir, Kore, Puck params = InputParams( temperature = 0.7 , # Set model input params language = Language. EN_US , # Set language (30+ languages supported) modalities = GeminiMultimodalModalities. AUDIO # Response modality ) ) ​ Configuration ​ Constructor Parameters ​ api_key str required Your Google API key ​ base_url str API endpoint URL ​ model str Gemini model to use (upgraded to new v1beta model) ​ voice_id str default: "Charon" Voice for text-to-speech (options: Aoede, Charon, Fenrir, Kore, Puck) Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), voice_id = "Puck" , # Choose your preferred voice ) ​ system_instruction str High-level instructions that guide the model’s behavior Copy Ask AI llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), system_instruction = "Talk like a pirate." , ) ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ start_video_paused bool default: "False" Whether to start with video input paused ​ tools Union[List[dict], ToolsSchema] Tools/functions available to the model ​ inference_on_context_initialization bool default: "True" Whether to generate a response when context is first set ​ Input Parameters ​ frequency_penalty float default: "None" Penalizes repeated tokens. Range: 0.0 to 2.0 ​ max_tokens int default: "4096" Maximum number of tokens to generate ​ modalities GeminiMultimodalModalities default: "AUDIO" Response modalities to include (options: AUDIO , TEXT ). ​ presence_penalty float default: "None" Penalizes tokens based on their presence in the text. Range: 0.0 to 2.0 ​ temperature float default: "None" Controls randomness in responses. Range: 0.0 to 2.0 ​ language Language default: "Language.EN_US" Language for generation. Over 30 languages are supported. ​ media_resolution GeminiMediaResolution default: "UNSPECIFIED" Controls image processing quality and token usage: LOW : Uses 64 tokens MEDIUM : Uses 256 tokens HIGH : Zoomed reframing with 256 tokens ​ vad GeminiVADParams Voice Activity Detection configuration: disabled : Toggle VAD on/off start_sensitivity : How quickly speech is detected (HIGH/LOW) end_sensitivity : How quickly turns end after pauses (HIGH/LOW) prefix_padding_ms : Milliseconds of audio to keep before speech silence_duration_ms : Milliseconds of silence to end a turn Copy Ask AI from pipecat.services.gemini_multimodal_live.events import ( StartSensitivity, EndSensitivity ) from pipecat.services.gemini_multimodal_live.gemini import ( GeminiVADParams, GeminiMediaResolution, ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( temperature = 0.7 , language = Language. ES , # Spanish language media_resolution = GeminiMediaResolution. HIGH , # Higher quality image processing vad = GeminiVADParams( start_sensitivity = StartSensitivity. HIGH , # Detect speech quickly end_sensitivity = EndSensitivity. LOW , # Allow longer pauses prefix_padding_ms = 300 , # Keep 300ms before speech silence_duration_ms = 1000 , # End turn after 1s silence ) ) ) ​ top_k int default: "None" Limits vocabulary to k most likely tokens. Minimum: 0 ​ top_p float default: "None" Cumulative probability cutoff for token selection. Range: 0.0 to 1.0 ​ context_window_compression ContextWindowCompressionParams Parameters for managing the context window: - enabled : Enable/disable compression (default: False) - trigger_tokens : Number of tokens that trigger compression (default: None, uses 80% of context window) Copy Ask AI from pipecat.services.gemini_multimodal_live.gemini import ( ContextWindowCompressionParams ) llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( top_p = 0.9 , # More focused token selection top_k = 40 , # Limit vocabulary options context_window_compression = ContextWindowCompressionParams( enabled = True , trigger_tokens = 8000 # Compress when reaching 8000 tokens ) ) ) ​ Methods ​ set_audio_input_paused(paused: bool) method Pause or unpause audio input processing ​ set_video_input_paused(paused: bool) method Pause or unpause video input processing ​ set_model_modalities(modalities: GeminiMultimodalModalities) method Change the response modality (TEXT or AUDIO) ​ set_language(language: Language) method Change the language for generation ​ set_context(context: OpenAILLMContext) method Set the conversation context explicitly ​ create_context_aggregator(context: OpenAILLMContext, user_params: LLMUserAggregatorParams, assistant_params: LLMAssistantAggregatorParams) method Create context aggregators for managing conversation state ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data for speech input ​ InputImageRawFrame Frame Raw image data for visual input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Adds messages to the conversation ​ LLMUpdateSettingsFrame Frame Updates LLM settings ​ LLMSetToolsFrame Frame Sets available tools for the LLM ​ Output Frames ​ TTSAudioRawFrame Frame Generated speech audio ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ LLMTextFrame Frame Generated text responses from the LLM ​ TTSTextFrame Frame Text used for speech synthesis ​ TranscriptionFrame Frame Speech transcriptions from user audio ​ LLMFullResponseStartFrame Frame Signals the start of a complete LLM response ​ LLMFullResponseEndFrame Frame Signals the end of a complete LLM response ​ Function Calling This service supports function calling (also known as tool calling) which allows the LLM to request information from external services and APIs. For example, you can enable your bot to: Check current weather conditions Query databases Access external APIs Perform custom actions See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Token Usage Tracking Gemini Multimodal Live automatically tracks token usage metrics, providing: Prompt token counts Completion token counts Total token counts Detailed token breakdowns by modality (text, audio) These metrics can be used for monitoring usage, optimizing costs, and understanding model performance. ​ Language Support Gemini Multimodal Live supports the following languages: Language Code Description Gemini Code Language.AR Arabic ar-XA Language.BN_IN Bengali (India) bn-IN Language.CMN_CN Chinese (Mandarin) cmn-CN Language.DE_DE German (Germany) de-DE Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ES_ES Spanish (Spain) es-ES Language.ES_US Spanish (US) es-US Language.FR_FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.GU_IN Gujarati (India) gu-IN Language.HI_IN Hindi (India) hi-IN Language.ID_ID Indonesian id-ID Language.IT_IT Italian (Italy) it-IT Language.JA_JP Japanese (Japan) ja-JP Language.KN_IN Kannada (India) kn-IN Language.KO_KR Korean (Korea) ko-KR Language.ML_IN Malayalam (India) ml-IN Language.MR_IN Marathi (India) mr-IN Language.NL_NL Dutch (Netherlands) nl-NL Language.PL_PL Polish (Poland) pl-PL Language.PT_BR Portuguese (Brazil) pt-BR Language.RU_RU Russian (Russia) ru-RU Language.TA_IN Tamil (India) ta-IN Language.TE_IN Telugu (India) te-IN Language.TH_TH Thai (Thailand) th-TH Language.TR_TR Turkish (Turkey) tr-TR Language.VI_VN Vietnamese (Vietnam) vi-VN You can set the language using the language parameter: Copy Ask AI from pipecat.transcriptions.language import Language from pipecat.services.gemini_multimodal_live.gemini import ( GeminiMultimodalLiveLLMService, InputParams ) # Set language during initialization llm = GeminiMultimodalLiveLLMService( api_key = os.getenv( "GOOGLE_API_KEY" ), params = InputParams( language = Language. ES_ES ) # Spanish (Spain) ) ​ Next Steps ​ Examples Foundational Example Basic implementation showing core features and transcription Simple Chatbot A client/server example showing how to build a Pipecat JS or React client that connects to a Gemini Live Pipecat bot. ​ Learn More Check out our Gemini Multimodal Live Guide for detailed explanations and best practices. AWS Nova Sonic OpenAI Realtime Beta On this page Installation Basic Usage Configuration Constructor Parameters Input Parameters Methods Frame Types Input Frames Output Frames Function Calling Token Usage Tracking Language Support Next Steps Examples Learn More Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_0551d23f.txt b/s2s_openai_0551d23f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8bb8f81fe9e2237981f5c0a2f71a45499fddb271
--- /dev/null
+++ b/s2s_openai_0551d23f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#foundational-examples
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_2e767896.txt b/s2s_openai_2e767896.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e91a87d5d926211e4ff42add77eb0a5cd47b7e61
--- /dev/null
+++ b/s2s_openai_2e767896.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#output-frames
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_3723b537.txt b/s2s_openai_3723b537.txt
new file mode 100644
index 0000000000000000000000000000000000000000..23539bbf072cffd7e3afb10511cdb8b78c736194
--- /dev/null
+++ b/s2s_openai_3723b537.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#param-retrieve-conversation-item
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_387988bd.txt b/s2s_openai_387988bd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ed8b5a339fb18a82bfea10b95ecca70ecad387c0
--- /dev/null
+++ b/s2s_openai_387988bd.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#param-turn-detection
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_560c8b73.txt b/s2s_openai_560c8b73.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ad5190af1207f5a9e27e627772899f767d7ed15a
--- /dev/null
+++ b/s2s_openai_560c8b73.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#frame-flow
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_88e9518d.txt b/s2s_openai_88e9518d.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0c8c4d14f6b6a6cc05693794e955f1f025150366
--- /dev/null
+++ b/s2s_openai_88e9518d.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#constructor-parameters
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_a0ad2fb4.txt b/s2s_openai_a0ad2fb4.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a37aa4d2fb6d6c803039be5339100554d153584c
--- /dev/null
+++ b/s2s_openai_a0ad2fb4.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#param-tool-choice
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_e04f906a.txt b/s2s_openai_e04f906a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d10c7e673a8990fe366dd6b8303748b3e1224218
--- /dev/null
+++ b/s2s_openai_e04f906a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#param-text-frame
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_ea6356ed.txt b/s2s_openai_ea6356ed.txt
new file mode 100644
index 0000000000000000000000000000000000000000..16466d2d97e3195fdfe999730ff5d7b58ffb4f6e
--- /dev/null
+++ b/s2s_openai_ea6356ed.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#control-output
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/s2s_openai_ebbdc7db.txt b/s2s_openai_ebbdc7db.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cfdd2428cc3ecfda11abfbd1736ec4ebc83a96d1
--- /dev/null
+++ b/s2s_openai_ebbdc7db.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/s2s/openai#param-instructions
+Title: OpenAI Realtime Beta - Pipecat
+==================================================
+
+OpenAI Realtime Beta - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Speech OpenAI Realtime Beta Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech AWS Nova Sonic Gemini Multimodal Live OpenAI Realtime Beta Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline OpenAIRealtimeBetaLLMService provides real-time, multimodal conversation capabilities using OpenAI’s Realtime Beta API. It supports speech-to-speech interactions with integrated LLM processing, function calling, and advanced conversation management. Real-time Interaction Stream audio in real-time with minimal latency response times Speech Processing Built-in speech-to-text and text-to-speech capabilities with voice options Advanced Turn Detection Multiple voice activity detection options including semantic turn detection Powerful Function Calling Seamless support for calling external functions and APIs ​ Installation To use OpenAIRealtimeBetaLLMService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . ​ Configuration ​ Constructor Parameters ​ api_key str required Your OpenAI API key ​ model str default: "gpt-4o-realtime-preview-2025-06-03" The speech-to-speech model used for processing ​ base_url str default: "wss://api.openai.com/v1/realtime" WebSocket endpoint URL ​ session_properties SessionProperties Configuration for the realtime session ​ start_audio_paused bool default: "False" Whether to start with audio input paused ​ send_transcription_frames bool default: "True" Whether to emit transcription frames ​ Session Properties The SessionProperties object configures the behavior of the realtime session: ​ modalities List[Literal['text', 'audio']] The modalities to enable (default includes both text and audio) ​ instructions str System instructions that guide the model’s behavior Copy Ask AI service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( instructions = "You are a helpful assistant. Be concise and friendly." ) ) ​ voice str Voice ID for text-to-speech (options: alloy, echo, fable, onyx, nova, shimmer) ​ input_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the input audio ​ output_audio_format Literal['pcm16', 'g711_ulaw', 'g711_alaw'] Format of the output audio ​ input_audio_transcription InputAudioTranscription Configuration for audio transcription Copy Ask AI from pipecat.services.openai_realtime_beta.events import InputAudioTranscription service = OpenAIRealtimeBetaLLMService( api_key = os.getenv( "OPENAI_API_KEY" ), session_properties = SessionProperties( input_audio_transcription = InputAudioTranscription( model = "gpt-4o-transcribe" , language = "en" , prompt = "This is a technical conversation about programming" ) ) ) ​ input_audio_noise_reduction InputAudioNoiseReduction Configuration for audio noise reduction ​ turn_detection Union[TurnDetection, SemanticTurnDetection, bool] Configuration for turn detection (set to False to disable) ​ tools List[Dict] List of function definitions for tool/function calling ​ tool_choice Literal['auto', 'none', 'required'] Controls when the model calls functions ​ temperature float Controls randomness in responses (0.0 to 2.0) ​ max_response_output_tokens Union[int, Literal['inf']] Maximum number of tokens to generate ​ Input Frames ​ Audio Input ​ InputAudioRawFrame Frame Raw audio data for speech input ​ Control Input ​ StartInterruptionFrame Frame Signals start of user interruption ​ UserStartedSpeakingFrame Frame Signals user started speaking ​ UserStoppedSpeakingFrame Frame Signals user stopped speaking ​ Context Input ​ OpenAILLMContextFrame Frame Contains conversation context ​ LLMMessagesAppendFrame Frame Appends messages to conversation ​ Output Frames ​ Audio Output ​ TTSAudioRawFrame Frame Generated speech audio ​ Control Output ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ Text Output ​ TextFrame Frame Generated text responses ​ TranscriptionFrame Frame Speech transcriptions ​ Events ​ on_conversation_item_created event Emitted when a conversation item on the server is created. Handler receives: item_id: str item: ConversationItem ​ on_conversation_item_updated event Emitted when a conversation item on the server is updated. Handler receives: item_id: str item: Optional[ConversationItem] (may not exist for some updates) ​ Methods ​ retrieve_conversation_item method Retrieves a conversation item’s details from the server. Copy Ask AI async def retrieve_conversation_item ( self , item_id : str ) -> ConversationItem ​ Usage Example Copy Ask AI from pipecat.services.openai_realtime_beta import OpenAIRealtimeBetaLLMService from pipecat.services.openai_realtime_beta.events import SessionProperties, TurnDetection # Configure service service = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( modalities = [ "audio" , "text" ], voice = "alloy" , turn_detection = TurnDetection( threshold = 0.5 , silence_duration_ms = 800 ), temperature = 0.7 ) ) # Use in pipeline pipeline = Pipeline([ audio_input, # Produces InputAudioRawFrame service, # Processes speech/generates responses audio_output # Handles TTSAudioRawFrame ]) ​ Function Calling The service supports function calling with automatic response handling: Copy Ask AI from pipecat.adapters.schemas.function_schema import FunctionSchema from pipecat.adapters.schemas.tools_schema import ToolsSchema from pipecat.services.openai_realtime_beta import SessionProperties # Define weather function using standardized schema weather_function = FunctionSchema( name = "get_weather" , description = "Get weather information" , properties = { "location" : { "type" : "string" } }, required = [ "location" ] ) # Create tools schema tools = ToolsSchema( standard_tools = [weather_function]) # Configure service with tools llm = OpenAIRealtimeBetaLLMService( api_key = "your-api-key" , session_properties = SessionProperties( tools = tools, tool_choice = "auto" ) ) llm.register_function( "get_weather" , fetch_weather_from_api) See the Function Calling guide for: Detailed implementation instructions Provider-specific function definitions Handler registration examples Control over function call behavior Complete usage examples ​ Frame Flow ​ Metrics Support The service collects comprehensive metrics: Token usage (prompt and completion) Processing duration Time to First Byte (TTFB) Audio processing metrics Function call metrics ​ Advanced Features ​ Turn Detection Copy Ask AI # Server-side basic VAD turn_detection = TurnDetection( type = "server_vad" , threshold = 0.5 , prefix_padding_ms = 300 , silence_duration_ms = 800 ) # Server-side semantic VAD turn_detection = SemanticTurnDetection( type = "semantic_vad" , eagerness = "auto" , # default. could also be "low" | "medium" | "high" create_response = True # default interrupt_response = True # default ) # Disable turn detection turn_detection = False ​ Context Management Copy Ask AI # Create context context = OpenAIRealtimeLLMContext( messages = [], tools = [], system = "You are a helpful assistant" ) # Create aggregators aggregators = service.create_context_aggregator(context) ​ Foundational Examples OpenAI Realtime Beta Example Basic implementation showing core realtime features including audio streaming, turn detection, and function calling. ​ Notes Supports real-time speech-to-speech conversation Handles interruptions and turn-taking Manages WebSocket connection lifecycle Provides function calling capabilities Supports conversation context management Includes comprehensive error handling Manages audio streaming and processing Handles both text and audio modalities Gemini Multimodal Live fal On this page Installation Configuration Constructor Parameters Session Properties Input Frames Audio Input Control Input Context Input Output Frames Audio Output Control Output Text Output Events Methods Usage Example Function Calling Frame Flow Metrics Support Advanced Features Turn Detection Context Management Foundational Examples Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/serializers_plivo_0bf96032.txt b/serializers_plivo_0bf96032.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3a9d2aea47e281ca9c35f7836cc77b902651407a
--- /dev/null
+++ b/serializers_plivo_0bf96032.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/serializers/plivo#param-auto-hang-up
+Title: PlivoFrameSerializer - Pipecat
+==================================================
+
+PlivoFrameSerializer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Serializers PlivoFrameSerializer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Frame Serializer Overview ExotelFrameSerializer PlivoFrameSerializer TwilioFrameSerializer TelnyxFrameSerializer Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview PlivoFrameSerializer enables integration with Plivo’s Audio Streaming WebSocket protocol, allowing your Pipecat application to handle phone calls via Plivo’s voice services. ​ Features Bidirectional audio conversion between Pipecat and Plivo DTMF (touch-tone) event handling Automatic call termination via Plivo’s REST API μ-law audio encoding/decoding ​ Installation The PlivoFrameSerializer does not require any additional dependencies beyond the core Pipecat library. ​ Configuration ​ Constructor Parameters ​ stream_id str required The Plivo Stream ID ​ call_id Optional[str] default: "None" The associated Plivo Call ID (required for auto hang-up) ​ auth_id Optional[str] default: "None" Plivo auth ID (required for auto hang-up) ​ auth_token Optional[str] default: "None" Plivo auth token (required for auto hang-up) ​ params InputParams default: "InputParams()" Configuration parameters ​ InputParams Configuration ​ plivo_sample_rate int default: "8000" Sample rate used by Plivo (typically 8kHz) ​ sample_rate int | None default: "None" Optional override for pipeline input sample rate ​ auto_hang_up bool default: "True" Whether to automatically terminate call on EndFrame ​ Basic Usage Copy Ask AI from pipecat.serializers.plivo import PlivoFrameSerializer from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) # Extract required values from Plivo WebSocket connection stream_id = start_message[ "start" ][ "streamId" ] call_id = start_message[ "start" ][ "callId" ] # Create serializer serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, auth_id = "your_plivo_auth_id" , auth_token = "your_plivo_auth_token" ) # Use with FastAPIWebsocketTransport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = serializer, ) ) ​ Hang-up Functionality When auto_hang_up is enabled, the serializer will automatically hang up the Plivo call when an EndFrame or CancelFrame is processed, using Plivo’s REST API: Copy Ask AI # Properly configured with hang-up support serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, # Required for auto hang-up auth_id = os.getenv( "PLIVO_AUTH_ID" ), # Required for auto hang-up auth_token = os.getenv( "PLIVO_AUTH_TOKEN" ), # Required for auto hang-up ) ​ Server Code Example Here’s a complete example of handling a Plivo WebSocket connection: Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.serializers.plivo import PlivoFrameSerializer import json import os app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Read the start message from Plivo start_data = websocket.iter_text() start_message = json.loads( await start_data. __anext__ ()) # Extract Plivo-specific IDs from the start event start_info = start_message.get( "start" , {}) stream_id = start_info.get( "streamId" ) call_id = start_info.get( "callId" ) # Create serializer with authentication for auto hang-up serializer = PlivoFrameSerializer( stream_id = stream_id, call_id = call_id, auth_id = os.getenv( "PLIVO_AUTH_ID" ), auth_token = os.getenv( "PLIVO_AUTH_TOKEN" ), ) # Continue with transport and pipeline setup... ​ Plivo XML Configuration To enable audio streaming with Plivo, you’ll need to configure your Plivo application to return appropriate XML: Copy Ask AI <? xml version = "1.0" encoding = "UTF-8" ?> < Response > < Stream keepCallAlive = "true" bidirectional = "true" contentType = "audio/x-mulaw;rate=8000" > wss://your-websocket-url/ws </ Stream > </ Response > The bidirectional="true" attribute is required for two-way audio communication, and keepCallAlive="true" prevents the call from being disconnected after XML execution. ​ Key Differences from Twilio Stream Identifier : Plivo uses streamId instead of streamSid Call Identifier : Plivo uses callId instead of callSid XML Structure : Plivo uses <Stream> element directly instead of <Connect><Stream> Authentication : Plivo uses Auth ID and Auth Token instead of Account SID and Auth Token See the Plivo Chatbot example for a complete implementation. ExotelFrameSerializer TwilioFrameSerializer On this page Overview Features Installation Configuration Constructor Parameters InputParams Configuration Basic Usage Hang-up Functionality Server Code Example Plivo XML Configuration Key Differences from Twilio Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/serializers_telnyx_6822dd84.txt b/serializers_telnyx_6822dd84.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7266314ba9f1f72c06dbff103b1f8624ca122494
--- /dev/null
+++ b/serializers_telnyx_6822dd84.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/serializers/telnyx#hang-up-functionality
+Title: TelnyxFrameSerializer - Pipecat
+==================================================
+
+TelnyxFrameSerializer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Serializers TelnyxFrameSerializer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Frame Serializer Overview ExotelFrameSerializer PlivoFrameSerializer TwilioFrameSerializer TelnyxFrameSerializer Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview TelnyxFrameSerializer enables integration with Telnyx’s WebSocket media streaming protocol, allowing your Pipecat application to handle phone calls via Telnyx’s voice services. ​ Features Bidirectional audio conversion between Pipecat and Telnyx DTMF (touch-tone) event handling Automatic call termination via Telnyx’s REST API Support for multiple audio encodings (PCMU, PCMA) ​ Installation The TelnyxFrameSerializer does not require any additional dependencies beyond the core Pipecat library. ​ Configuration ​ Constructor Parameters ​ stream_id str required The Stream ID for Telnyx ​ outbound_encoding str required The encoding type for outbound audio (e.g., “PCMU”, “PCMA”) ​ inbound_encoding str required The encoding type for inbound audio (e.g., “PCMU”, “PCMA”) ​ call_control_id Optional[str] default: "None" The Call Control ID for the Telnyx call (required for auto hang-up) ​ api_key Optional[str] default: "None" Your Telnyx API key (required for auto hang-up) ​ params InputParams default: "InputParams()" Configuration parameters ​ InputParams Configuration ​ telnyx_sample_rate int default: "8000" Sample rate used by Telnyx (typically 8kHz) ​ sample_rate int | None default: "None" Optional override for pipeline input sample rate ​ inbound_encoding str default: "PCMU" Audio encoding for data sent to Telnyx ​ outbound_encoding str default: "PCMU" Audio encoding for data received from Telnyx ​ auto_hang_up bool default: "True" Whether to automatically terminate call on EndFrame ​ Basic Usage Copy Ask AI from pipecat.serializers.telnyx import TelnyxFrameSerializer from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) # Extract required values from Telnyx WebSocket connection stream_id = call_data[ "stream_id" ] call_control_id = call_data[ "start" ][ "call_control_id" ] outbound_encoding = call_data[ "start" ][ "media_format" ][ "encoding" ] # Create serializer serializer = TelnyxFrameSerializer( stream_id = stream_id, outbound_encoding = outbound_encoding, inbound_encoding = "PCMU" , call_control_id = call_control_id, api_key = os.getenv( "TELNYX_API_KEY" ) ) # Use with FastAPIWebsocketTransport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = serializer, ) ) ​ Hang-up Functionality When auto_hang_up is enabled, the serializer will automatically hang up the Telnyx call when an EndFrame or CancelFrame is processed, using Telnyx’s REST API: Copy Ask AI # Properly configured with hang-up support serializer = TelnyxFrameSerializer( stream_id = stream_id, outbound_encoding = outbound_encoding, inbound_encoding = "PCMU" , call_control_id = call_control_id, # Required for auto hang-up api_key = os.getenv( "TELNYX_API_KEY" ) # Required for auto hang-up ) ​ Server Code Example Here’s a complete example of handling a Telnyx WebSocket connection: Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.serializers.telnyx import TelnyxFrameSerializer import json import os app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Read initial messages from Telnyx start_data = websocket.iter_text() await start_data. __anext__ () # Skip first message # Parse the second message to get call details call_data = json.loads( await start_data. __anext__ ()) # Extract Telnyx-specific IDs and encoding stream_id = call_data[ "stream_id" ] call_control_id = call_data[ "start" ][ "call_control_id" ] outbound_encoding = call_data[ "start" ][ "media_format" ][ "encoding" ] # Create serializer with API key for auto hang-up serializer = TelnyxFrameSerializer( stream_id = stream_id, outbound_encoding = outbound_encoding, inbound_encoding = "PCMU" , call_control_id = call_control_id, api_key = os.getenv( "TELNYX_API_KEY" ), ) # Continue with transport and pipeline setup... See the Telnyx Chatbot example for a complete implementation. TwilioFrameSerializer AssemblyAI On this page Overview Features Installation Configuration Constructor Parameters InputParams Configuration Basic Usage Hang-up Functionality Server Code Example Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/serializers_twilio_a9007336.txt b/serializers_twilio_a9007336.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a02513e96c07050f430c6aa183fdf5fc679f79f7
--- /dev/null
+++ b/serializers_twilio_a9007336.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/serializers/twilio#param-twilio-sample-rate
+Title: TwilioFrameSerializer - Pipecat
+==================================================
+
+TwilioFrameSerializer - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Serializers TwilioFrameSerializer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Frame Serializer Overview ExotelFrameSerializer PlivoFrameSerializer TwilioFrameSerializer TelnyxFrameSerializer Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview TwilioFrameSerializer enables integration with Twilio’s Media Streams WebSocket protocol, allowing your Pipecat application to handle phone calls via Twilio’s voice services. ​ Features Bidirectional audio conversion between Pipecat and Twilio DTMF (touch-tone) event handling Automatic call termination via Twilio’s REST API μ-law audio encoding/decoding ​ Installation The TwilioFrameSerializer does not require any additional dependencies beyond the core Pipecat library. ​ Configuration ​ Constructor Parameters ​ stream_sid str required The Twilio Media Stream SID ​ call_sid Optional[str] default: "None" The associated Twilio Call SID (required for auto hang-up) ​ account_sid Optional[str] default: "None" Twilio account SID (required for auto hang-up) ​ auth_token Optional[str] default: "None" Twilio auth token (required for auto hang-up) ​ params InputParams default: "InputParams()" Configuration parameters ​ InputParams Configuration ​ twilio_sample_rate int default: "8000" Sample rate used by Twilio (typically 8kHz) ​ sample_rate int | None default: "None" Optional override for pipeline input sample rate ​ auto_hang_up bool default: "True" Whether to automatically terminate call on EndFrame ​ Basic Usage Copy Ask AI from pipecat.serializers.twilio import TwilioFrameSerializer from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) # Extract required values from Twilio WebSocket connection stream_sid = call_data[ "start" ][ "streamSid" ] call_sid = call_data[ "start" ][ "callSid" ] # Create serializer serializer = TwilioFrameSerializer( stream_sid = stream_sid, call_sid = call_sid, account_sid = "your_twilio_account_sid" , auth_token = "your_twilio_auth_token" ) # Use with FastAPIWebsocketTransport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = serializer, ) ) ​ Hang-up Functionality When auto_hang_up is enabled, the serializer will automatically hang up the Twilio call when an EndFrame or CancelFrame is processed, using Twilio’s REST API: Copy Ask AI # Properly configured with hang-up support serializer = TwilioFrameSerializer( stream_sid = stream_sid, call_sid = call_sid, # Required for auto hang-up account_sid = os.getenv( "TWILIO_ACCOUNT_SID" ), # Required for auto hang-up auth_token = os.getenv( "TWILIO_AUTH_TOKEN" ), # Required for auto hang-up ) ​ Server Code Example Here’s a complete example of handling a Twilio WebSocket connection: Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.serializers.twilio import TwilioFrameSerializer import json import os app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Read initial messages from Twilio start_data = websocket.iter_text() await start_data. __anext__ () # Skip first message # Parse the second message to get call details call_data = json.loads( await start_data. __anext__ ()) # Extract Twilio-specific IDs stream_sid = call_data[ "start" ][ "streamSid" ] call_sid = call_data[ "start" ][ "callSid" ] # Create serializer with authentication for auto hang-up serializer = TwilioFrameSerializer( stream_sid = stream_sid, call_sid = call_sid, account_sid = os.getenv( "TWILIO_ACCOUNT_SID" ), auth_token = os.getenv( "TWILIO_AUTH_TOKEN" ), ) # Continue with transport and pipeline setup... See the Twilio Chatbot example for a complete implementation. PlivoFrameSerializer TelnyxFrameSerializer On this page Overview Features Installation Configuration Constructor Parameters InputParams Configuration Basic Usage Hang-up Functionality Server Code Example Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/server_utilities_36dad951.txt b/server_utilities_36dad951.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cadca5965d7eed3fb31e6f7ab2ce609d53b66ace
--- /dev/null
+++ b/server_utilities_36dad951.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities#param-passthrough
+Title: Producer & Consumer Processors - Pipecat
+==================================================
+
+Producer & Consumer Processors - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Advanced Frame Processors Producer & Consumer Processors Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The Producer and Consumer processors work as a pair to route frames between different parts of a pipeline, particularly useful when working with ParallelPipeline . They allow you to selectively capture frames from one pipeline branch and inject them into another. ​ ProducerProcessor ProducerProcessor examines frames flowing through the pipeline, applies a filter to decide which frames to share, and optionally transforms these frames before sending them to connected consumers. ​ Constructor Parameters ​ filter Callable[[Frame], Awaitable[bool]] required An async function that determines which frames should be sent to consumers. Should return True for frames to be shared. ​ transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before sending to consumers. By default, passes frames unchanged. ​ passthrough bool default: "True" When True , passes all frames through the normal pipeline flow. When False , only passes through frames that don’t match the filter. ​ ConsumerProcessor ConsumerProcessor receives frames from a ProducerProcessor and injects them into its pipeline branch. ​ Constructor Parameters ​ producer ProducerProcessor required The producer processor that will send frames to this consumer. ​ transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before injecting them into the pipeline. ​ direction FrameDirection default: "FrameDirection.DOWNSTREAM" The direction in which to push received frames. Usually DOWNSTREAM to send frames forward in the pipeline. ​ Usage Examples ​ Basic Usage: Moving TTS Audio Between Branches Copy Ask AI # Create a producer that captures TTS audio frames async def is_tts_audio ( frame : Frame) -> bool : return isinstance (frame, TTSAudioRawFrame) # Define an async transformer function async def tts_to_input_audio_transformer ( frame : Frame) -> Frame: if isinstance (frame, TTSAudioRawFrame): # Convert TTS audio to input audio format return InputAudioRawFrame( audio = frame.audio, sample_rate = frame.sample_rate, num_channels = frame.num_channels ) return frame producer = ProducerProcessor( filter = is_tts_audio, transformer = tts_to_input_audio_transformer passthrough = True # Keep these frames in original pipeline ) # Create a consumer to receive the frames consumer = ConsumerProcessor( producer = producer, direction = FrameDirection. DOWNSTREAM ) # Use in a ParallelPipeline pipeline = Pipeline([ transport.input(), ParallelPipeline( # Branch 1: LLM for bot responses [ llm, tts, producer, # Capture TTS audio here ], # Branch 2: Audio processing branch [ consumer, # Receive TTS audio here llm, # Speech-to-Speech LLM (audio in) ] ), transport.output(), ]) Sentry Metrics UserIdleProcessor On this page Overview ProducerProcessor Constructor Parameters ConsumerProcessor Constructor Parameters Usage Examples Basic Usage: Moving TTS Audio Between Branches Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/server_utilities_5c2308de.txt b/server_utilities_5c2308de.txt
new file mode 100644
index 0000000000000000000000000000000000000000..251ac37aaa3e9f3f398009aa6eaeabc410c5a83b
--- /dev/null
+++ b/server_utilities_5c2308de.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities
+Title: Producer & Consumer Processors - Pipecat
+==================================================
+
+Producer & Consumer Processors - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Advanced Frame Processors Producer & Consumer Processors Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The Producer and Consumer processors work as a pair to route frames between different parts of a pipeline, particularly useful when working with ParallelPipeline . They allow you to selectively capture frames from one pipeline branch and inject them into another. ​ ProducerProcessor ProducerProcessor examines frames flowing through the pipeline, applies a filter to decide which frames to share, and optionally transforms these frames before sending them to connected consumers. ​ Constructor Parameters ​ filter Callable[[Frame], Awaitable[bool]] required An async function that determines which frames should be sent to consumers. Should return True for frames to be shared. ​ transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before sending to consumers. By default, passes frames unchanged. ​ passthrough bool default: "True" When True , passes all frames through the normal pipeline flow. When False , only passes through frames that don’t match the filter. ​ ConsumerProcessor ConsumerProcessor receives frames from a ProducerProcessor and injects them into its pipeline branch. ​ Constructor Parameters ​ producer ProducerProcessor required The producer processor that will send frames to this consumer. ​ transformer Callable[[Frame], Awaitable[Frame]] default: "identity_transformer" Optional async function that transforms frames before injecting them into the pipeline. ​ direction FrameDirection default: "FrameDirection.DOWNSTREAM" The direction in which to push received frames. Usually DOWNSTREAM to send frames forward in the pipeline. ​ Usage Examples ​ Basic Usage: Moving TTS Audio Between Branches Copy Ask AI # Create a producer that captures TTS audio frames async def is_tts_audio ( frame : Frame) -> bool : return isinstance (frame, TTSAudioRawFrame) # Define an async transformer function async def tts_to_input_audio_transformer ( frame : Frame) -> Frame: if isinstance (frame, TTSAudioRawFrame): # Convert TTS audio to input audio format return InputAudioRawFrame( audio = frame.audio, sample_rate = frame.sample_rate, num_channels = frame.num_channels ) return frame producer = ProducerProcessor( filter = is_tts_audio, transformer = tts_to_input_audio_transformer passthrough = True # Keep these frames in original pipeline ) # Create a consumer to receive the frames consumer = ConsumerProcessor( producer = producer, direction = FrameDirection. DOWNSTREAM ) # Use in a ParallelPipeline pipeline = Pipeline([ transport.input(), ParallelPipeline( # Branch 1: LLM for bot responses [ llm, tts, producer, # Capture TTS audio here ], # Branch 2: Audio processing branch [ consumer, # Receive TTS audio here llm, # Speech-to-Speech LLM (audio in) ] ), transport.output(), ]) Sentry Metrics UserIdleProcessor On this page Overview ProducerProcessor Constructor Parameters ConsumerProcessor Constructor Parameters Usage Examples Basic Usage: Moving TTS Audio Between Branches Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/services_supported-services_d66f35fe.txt b/services_supported-services_d66f35fe.txt
new file mode 100644
index 0000000000000000000000000000000000000000..afb97d4c3197cdc93ee9d00611eae61cea77dcb5
--- /dev/null
+++ b/services_supported-services_d66f35fe.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/supported-services#analytics-%26-monitoring
+Title: Supported Services - Pipecat
+==================================================
+
+Supported Services - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Services Supported Services Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Transports Transports exchange audio and video streams between the user and bot. Service Setup Daily pip install "pipecat-ai[daily]" SmallWebRTCTransport pip install "pipecat-ai[webrtc]" FastAPI WebSocket pip install "pipecat-ai[websocket]" WebSocket Server pip install "pipecat-ai[websocket]" ​ Serializers Serializers convert between frames and media streams, enabling real-time communication over a websocket. Service Setup Exotel No dependencies required Plivo No dependencies required Telnyx No dependencies required Twilio No dependencies required ​ Speech-to-Text Speech-to-Text services receive and audio input and output transcriptions. Service Setup AssemblyAI pip install "pipecat-ai[assemblyai]" AWS Transcribe pip install "pipecat-ai[aws]" Azure pip install "pipecat-ai[azure]" Cartesia pip install "pipecat-ai[cartesia]" Deepgram pip install "pipecat-ai[deepgram]" Fal Wizper pip install "pipecat-ai[fal]" Gladia pip install "pipecat-ai[gladia]" Google pip install "pipecat-ai[google]" Groq (Whisper) pip install "pipecat-ai[groq]" NVIDIA Riva pip install "pipecat-ai[riva]" OpenAI (Whisper) pip install "pipecat-ai[openai]" SambaNova (Whisper) pip install "pipecat-ai[sambanova]" Speechmatics pip install "pipecat-ai[speechmatics]" Ultravox pip install "pipecat-ai[ultravox]" Whisper pip install "pipecat-ai[whisper]" ​ Large Language Models LLMs receive text or audio based input and output a streaming text response. Service Setup Anthropic pip install "pipecat-ai[anthropic]" AWS Bedrock pip install "pipecat-ai[aws]" Azure pip install "pipecat-ai[azure]" Cerebras pip install "pipecat-ai[cerebras]" DeepSeek pip install "pipecat-ai[deepseek]" Fireworks AI pip install "pipecat-ai[fireworks]" Google Gemini pip install "pipecat-ai[google]" Google Vertex AI pip install "pipecat-ai[google]" Grok pip install "pipecat-ai[grok]" Groq pip install "pipecat-ai[groq]" NVIDIA NIM pip install "pipecat-ai[nim]" Ollama pip install "pipecat-ai[ollama]" OpenAI pip install "pipecat-ai[openai]" OpenPipe pip install "pipecat-ai[openpipe]" OpenRouter pip install "pipecat-ai[openrouter]" Perplexity pip install "pipecat-ai[perplexity]" Qwen pip install "pipecat-ai[qwen]" SambaNova pip install "pipecat-ai[sambanova]" Together AI pip install "pipecat-ai[together]" ​ Text-to-Speech Text-to-Speech services receive text input and output audio streams or chunks. Service Setup AWS Polly pip install "pipecat-ai[aws]" Azure pip install "pipecat-ai[azure]" Cartesia pip install "pipecat-ai[cartesia]" Deepgram pip install "pipecat-ai[deepgram]" ElevenLabs pip install "pipecat-ai[elevenlabs]" Fish pip install "pipecat-ai[fish]" Google pip install "pipecat-ai[google]" Groq pip install "pipecat-ai[groq]" LMNT pip install "pipecat-ai[lmnt]" MiniMax No dependencies required Neuphonic pip install "pipecat-ai[neuphonic]" NVIDIA Riva pip install "pipecat-ai[riva]" OpenAI pip install "pipecat-ai[openai]" Piper No dependencies required PlayHT pip install "pipecat-ai[playht]" Rime pip install "pipecat-ai[rime]" Sarvam No dependencies required XTTS pip install "pipecat-ai[xtts]" ​ Speech-to-Speech Speech-to-Speech services are multi-modal LLM services that take in audio, video, or text and output audio or text. Service Setup AWS Nova Sonic pip install "pipecat-ai[aws-nova-sonic]" Gemini Multimodal Live pip install "pipecat-ai[google]" OpenAI Realtime pip install "pipecat-ai[openai]" ​ Image Generation Image generation services receive text inputs and output images. Service Setup fal pip install "pipecat-ai[fal]" Google Imagen pip install "pipecat-ai[google]" OpenAI pip install "pipecat-ai[openai]" ​ Video Video services enable you to build an avatar where audio and video are synchronized. Service Setup Simli pip install "pipecat-ai[simli]" Tavus pip install "pipecat-ai[tavus]" ​ Memory Memory services can be used to store and retrieve conversations. Service Setup mem0 pip install "pipecat-ai[mem0]" ​ Vision Vision services receive a streaming video input and output text describing the video input. Service Setup Moondream pip install "pipecat-ai[moondream]" ​ Analytics & Monitoring Analytics services help you better understand how your service operates. Service Setup Sentry pip install "pipecat-ai[sentry]" Reference docs Daily WebRTC On this page Transports Serializers Speech-to-Text Large Language Models Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/smart-turn_smart-turn-overview_6e5ec518.txt b/smart-turn_smart-turn-overview_6e5ec518.txt
new file mode 100644
index 0000000000000000000000000000000000000000..213ed3ff4ca49ff10f1a000fad70831c9a718482
--- /dev/null
+++ b/smart-turn_smart-turn-overview_6e5ec518.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/smart-turn/smart-turn-overview#local-model-setup
+Title: Smart Turn Overview - Pipecat
+==================================================
+
+Smart Turn Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Smart Turn Detection Smart Turn Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Smart Turn Overview Fal Smart Turn Local CoreML Smart Turn Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Smart Turn Detection is an advanced feature in Pipecat that determines when a user has finished speaking and the bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection uses a machine learning model to recognize natural conversational cues like intonation patterns and linguistic signals. Smart Turn Model Open source model for advanced conversational turn detection. Contribute to model training and development. Data Collector Contribute conversational data to improve the smart-turn model Data Classifier Help classify turn completion patterns in conversations Pipecat provides three implementations of Smart Turn Detection: FalSmartTurnAnalyzer - Uses a Fal’s hosted smart-turn model for inference LocalCoreMLSmartTurnAnalyzer - Runs inference locally on Apple Silicon using CoreML LocalSmartTurnAnalyzer - Runs inference locally using PyTorch and Hugging Face Transformers All implementations share the same underlying API and parameters, making it easy to switch between them based on your deployment requirements. ​ Installation The Smart Turn Detection feature requires additional dependencies depending on which implementation you choose. For Fal’s hosted service inference: Copy Ask AI pip install "pipecat-ai[remote-smart-turn]" For local inference (CoreML-based): Copy Ask AI pip install "pipecat-ai[local-smart-turn]" For local inference (PyTorch-based): Copy Ask AI pip install "pipecat-ai[local-smart-turn]" ​ Integration with Transport Smart Turn Detection is integrated into your application by setting one of the available turn analyzers as the turn_analyzer parameter in your transport configuration: Copy Ask AI from pipecat.transports.base_transport import TransportParams transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( # Other transport parameters... turn_analyzer = FalSmartTurnAnalyzer( url = remote_smart_turn_url), ), ) Smart Turn Detection requires VAD to be enabled and works best when the VAD analyzer is set to a short stop_secs value. We recommend 0.2 seconds. Copy Ask AI audio_in_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )) ​ Configuration All implementations use the same SmartTurnParams class to configure behavior: ​ stop_secs float default: "3.0" Duration of silence in seconds required before triggering a silence-based end of turn ​ pre_speech_ms float default: "0.0" Amount of audio (in milliseconds) to include before speech is detected ​ max_duration_secs float default: "8.0" Maximum allowed segment duration in seconds. For segments longer than this value, a rolling window is used. ​ Remote Smart Turn The FalSmartTurnAnalyzer class uses a remote service for turn detection inference. ​ Constructor Parameters ​ url str required The URL of the remote Smart Turn service ​ sample_rate Optional[int] default: "None" Audio sample rate (will be set by the transport if not provided) ​ params SmartTurnParams default: "SmartTurnParams()" Configuration parameters for turn detection ​ Example Copy Ask AI import os from pipecat.audio.turn.smart_turn.fal_smart_turn import FalSmartTurnAnalyzer from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams from pipecat.transports.base_transport import TransportParams # Get the URL for the remote Smart Turn service remote_smart_turn_url = os.getenv( "REMOTE_SMART_TURN_URL" ) # Create the transport with Smart Turn detection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )), turn_analyzer = FalSmartTurnAnalyzer( url = remote_smart_turn_url, params = SmartTurnParams( stop_secs = 3.0 , pre_speech_ms = 0.0 , max_duration_secs = 8.0 ) ), ), ) ​ Local Smart Turn (CoreML) The LocalCoreMLSmartTurnAnalyzer runs inference locally using CoreML, providing lower latency and no network dependencies. ​ Constructor Parameters ​ smart_turn_model_path str required Path to the directory containing the Smart Turn model ​ sample_rate Optional[int] default: "None" Audio sample rate (will be set by the transport if not provided) ​ params SmartTurnParams default: "SmartTurnParams()" Configuration parameters for turn detection ​ Example Copy Ask AI import os from pipecat.audio.turn.smart_turn.local_coreml_smart_turn import LocalCoreMLSmartTurnAnalyzer from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams from pipecat.transports.base_transport import TransportParams # Path to the Smart Turn model directory smart_turn_model_path = os.getenv( "LOCAL_SMART_TURN_MODEL_PATH" ) # Create the transport with local Smart Turn detection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )), turn_analyzer = LocalCoreMLSmartTurnAnalyzer( smart_turn_model_path = smart_turn_model_path, params = SmartTurnParams( stop_secs = 2.0 , # Shorter stop time when using Smart Turn pre_speech_ms = 0.0 , max_duration_secs = 8.0 ) ), ), ) ​ Local Smart Turn (PyTorch) The LocalSmartTurnAnalyzer runs inference locally using PyTorch and Hugging Face Transformers, providing a cross-platform solution. ​ Constructor Parameters ​ smart_turn_model_path str default: "pipecat-ai/smart-turn" Path to the Smart Turn model or Hugging Face model identifier. Defaults to the official “pipecat-ai/smart-turn” model. ​ sample_rate Optional[int] default: "None" Audio sample rate (will be set by the transport if not provided) ​ params SmartTurnParams default: "SmartTurnParams()" Configuration parameters for turn detection ​ Example Copy Ask AI import os from pipecat.audio.turn.smart_turn.local_smart_turn import LocalSmartTurnAnalyzer from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams from pipecat.transports.base_transport import TransportParams # Optional: Path to the local Smart Turn model # If not provided, it will download from Hugging Face smart_turn_model_path = os.getenv( "LOCAL_SMART_TURN_MODEL_PATH" ) # Create the transport with PyTorch-based Smart Turn detection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )), turn_analyzer = LocalSmartTurnAnalyzer( smart_turn_model_path = smart_turn_model_path, params = SmartTurnParams( stop_secs = 2.0 , pre_speech_ms = 0.0 , max_duration_secs = 8.0 ) ), ), ) ​ Local Model Setup ​ CoreML Model & PyTorch Setup To use the LocalCoreMLSmartTurnAnalyzer or LocalSmartTurnAnalyzer , you need to set up the model locally: Install Git LFS (Large File Storage): macOS Ubuntu/Debian Copy Ask AI brew install git-lfs Initialize Git LFS Copy Ask AI git lfs install Clone the Smart Turn model repository: Copy Ask AI git clone https://huggingface.co/pipecat-ai/smart-turn Set the environment variable to the cloned repository path: Copy Ask AI # Add to your .env file or environment export LOCAL_SMART_TURN_MODEL_PATH = / path / to / smart-turn Note that the CoreML model is optimized for Apple Silicon devices. If you’re using a different platform, consider using the PyTorch-based LocalSmartTurnAnalyzer or the remote Smart Turn service. Learn more about the CoreML setup in the official repository instructions ​ How It Works Smart Turn Detection continuously analyzes audio streams to identify natural turn completion points: Audio Buffering : The system continuously buffers audio with timestamps, maintaining a small buffer of pre-speech audio. VAD Processing : Voice Activity Detection segments the audio into speech and non-speech portions. Turn Analysis : When VAD detects a pause in speech: The ML model analyzes the speech segment for natural completion cues It identifies acoustic and linguistic patterns that indicate turn completion A decision is made whether the turn is complete or incomplete The system includes a fallback mechanism: if a turn is classified as incomplete but silence continues for longer than stop_secs , the turn is automatically marked as complete. ​ Notes The model is designed for English speech; performance may vary with other languages You can adjust the stop_secs parameter based on your application’s needs for responsiveness Smart Turn generally provides a more natural conversational experience but is computationally more intensive than simple VAD The PyTorch-based LocalSmartTurnAnalyzer runs on CPU by default but will use CUDA if available Daily REST Helper Fal Smart Turn On this page Overview Installation Integration with Transport Configuration Remote Smart Turn Constructor Parameters Example Local Smart Turn (CoreML) Constructor Parameters Example Local Smart Turn (PyTorch) Constructor Parameters Example Local Model Setup CoreML Model & PyTorch Setup How It Works Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/smart-turn_smart-turn-overview_b4bc5c8a.txt b/smart-turn_smart-turn-overview_b4bc5c8a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..412bd10297c17747bf7b9212eeaca31e54b52e30
--- /dev/null
+++ b/smart-turn_smart-turn-overview_b4bc5c8a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/smart-turn/smart-turn-overview#constructor-parameters
+Title: Smart Turn Overview - Pipecat
+==================================================
+
+Smart Turn Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Smart Turn Detection Smart Turn Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Smart Turn Overview Fal Smart Turn Local CoreML Smart Turn Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Smart Turn Detection is an advanced feature in Pipecat that determines when a user has finished speaking and the bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection uses a machine learning model to recognize natural conversational cues like intonation patterns and linguistic signals. Smart Turn Model Open source model for advanced conversational turn detection. Contribute to model training and development. Data Collector Contribute conversational data to improve the smart-turn model Data Classifier Help classify turn completion patterns in conversations Pipecat provides three implementations of Smart Turn Detection: FalSmartTurnAnalyzer - Uses a Fal’s hosted smart-turn model for inference LocalCoreMLSmartTurnAnalyzer - Runs inference locally on Apple Silicon using CoreML LocalSmartTurnAnalyzer - Runs inference locally using PyTorch and Hugging Face Transformers All implementations share the same underlying API and parameters, making it easy to switch between them based on your deployment requirements. ​ Installation The Smart Turn Detection feature requires additional dependencies depending on which implementation you choose. For Fal’s hosted service inference: Copy Ask AI pip install "pipecat-ai[remote-smart-turn]" For local inference (CoreML-based): Copy Ask AI pip install "pipecat-ai[local-smart-turn]" For local inference (PyTorch-based): Copy Ask AI pip install "pipecat-ai[local-smart-turn]" ​ Integration with Transport Smart Turn Detection is integrated into your application by setting one of the available turn analyzers as the turn_analyzer parameter in your transport configuration: Copy Ask AI from pipecat.transports.base_transport import TransportParams transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( # Other transport parameters... turn_analyzer = FalSmartTurnAnalyzer( url = remote_smart_turn_url), ), ) Smart Turn Detection requires VAD to be enabled and works best when the VAD analyzer is set to a short stop_secs value. We recommend 0.2 seconds. Copy Ask AI audio_in_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )) ​ Configuration All implementations use the same SmartTurnParams class to configure behavior: ​ stop_secs float default: "3.0" Duration of silence in seconds required before triggering a silence-based end of turn ​ pre_speech_ms float default: "0.0" Amount of audio (in milliseconds) to include before speech is detected ​ max_duration_secs float default: "8.0" Maximum allowed segment duration in seconds. For segments longer than this value, a rolling window is used. ​ Remote Smart Turn The FalSmartTurnAnalyzer class uses a remote service for turn detection inference. ​ Constructor Parameters ​ url str required The URL of the remote Smart Turn service ​ sample_rate Optional[int] default: "None" Audio sample rate (will be set by the transport if not provided) ​ params SmartTurnParams default: "SmartTurnParams()" Configuration parameters for turn detection ​ Example Copy Ask AI import os from pipecat.audio.turn.smart_turn.fal_smart_turn import FalSmartTurnAnalyzer from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams from pipecat.transports.base_transport import TransportParams # Get the URL for the remote Smart Turn service remote_smart_turn_url = os.getenv( "REMOTE_SMART_TURN_URL" ) # Create the transport with Smart Turn detection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )), turn_analyzer = FalSmartTurnAnalyzer( url = remote_smart_turn_url, params = SmartTurnParams( stop_secs = 3.0 , pre_speech_ms = 0.0 , max_duration_secs = 8.0 ) ), ), ) ​ Local Smart Turn (CoreML) The LocalCoreMLSmartTurnAnalyzer runs inference locally using CoreML, providing lower latency and no network dependencies. ​ Constructor Parameters ​ smart_turn_model_path str required Path to the directory containing the Smart Turn model ​ sample_rate Optional[int] default: "None" Audio sample rate (will be set by the transport if not provided) ​ params SmartTurnParams default: "SmartTurnParams()" Configuration parameters for turn detection ​ Example Copy Ask AI import os from pipecat.audio.turn.smart_turn.local_coreml_smart_turn import LocalCoreMLSmartTurnAnalyzer from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams from pipecat.transports.base_transport import TransportParams # Path to the Smart Turn model directory smart_turn_model_path = os.getenv( "LOCAL_SMART_TURN_MODEL_PATH" ) # Create the transport with local Smart Turn detection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )), turn_analyzer = LocalCoreMLSmartTurnAnalyzer( smart_turn_model_path = smart_turn_model_path, params = SmartTurnParams( stop_secs = 2.0 , # Shorter stop time when using Smart Turn pre_speech_ms = 0.0 , max_duration_secs = 8.0 ) ), ), ) ​ Local Smart Turn (PyTorch) The LocalSmartTurnAnalyzer runs inference locally using PyTorch and Hugging Face Transformers, providing a cross-platform solution. ​ Constructor Parameters ​ smart_turn_model_path str default: "pipecat-ai/smart-turn" Path to the Smart Turn model or Hugging Face model identifier. Defaults to the official “pipecat-ai/smart-turn” model. ​ sample_rate Optional[int] default: "None" Audio sample rate (will be set by the transport if not provided) ​ params SmartTurnParams default: "SmartTurnParams()" Configuration parameters for turn detection ​ Example Copy Ask AI import os from pipecat.audio.turn.smart_turn.local_smart_turn import LocalSmartTurnAnalyzer from pipecat.audio.turn.smart_turn.base_smart_turn import SmartTurnParams from pipecat.transports.base_transport import TransportParams # Optional: Path to the local Smart Turn model # If not provided, it will download from Hugging Face smart_turn_model_path = os.getenv( "LOCAL_SMART_TURN_MODEL_PATH" ) # Create the transport with PyTorch-based Smart Turn detection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.2 )), turn_analyzer = LocalSmartTurnAnalyzer( smart_turn_model_path = smart_turn_model_path, params = SmartTurnParams( stop_secs = 2.0 , pre_speech_ms = 0.0 , max_duration_secs = 8.0 ) ), ), ) ​ Local Model Setup ​ CoreML Model & PyTorch Setup To use the LocalCoreMLSmartTurnAnalyzer or LocalSmartTurnAnalyzer , you need to set up the model locally: Install Git LFS (Large File Storage): macOS Ubuntu/Debian Copy Ask AI brew install git-lfs Initialize Git LFS Copy Ask AI git lfs install Clone the Smart Turn model repository: Copy Ask AI git clone https://huggingface.co/pipecat-ai/smart-turn Set the environment variable to the cloned repository path: Copy Ask AI # Add to your .env file or environment export LOCAL_SMART_TURN_MODEL_PATH = / path / to / smart-turn Note that the CoreML model is optimized for Apple Silicon devices. If you’re using a different platform, consider using the PyTorch-based LocalSmartTurnAnalyzer or the remote Smart Turn service. Learn more about the CoreML setup in the official repository instructions ​ How It Works Smart Turn Detection continuously analyzes audio streams to identify natural turn completion points: Audio Buffering : The system continuously buffers audio with timestamps, maintaining a small buffer of pre-speech audio. VAD Processing : Voice Activity Detection segments the audio into speech and non-speech portions. Turn Analysis : When VAD detects a pause in speech: The ML model analyzes the speech segment for natural completion cues It identifies acoustic and linguistic patterns that indicate turn completion A decision is made whether the turn is complete or incomplete The system includes a fallback mechanism: if a turn is classified as incomplete but silence continues for longer than stop_secs , the turn is automatically marked as complete. ​ Notes The model is designed for English speech; performance may vary with other languages You can adjust the stop_secs parameter based on your application’s needs for responsiveness Smart Turn generally provides a more natural conversational experience but is computationally more intensive than simple VAD The PyTorch-based LocalSmartTurnAnalyzer runs on CPU by default but will use CUDA if available Daily REST Helper Fal Smart Turn On this page Overview Installation Integration with Transport Configuration Remote Smart Turn Constructor Parameters Example Local Smart Turn (CoreML) Constructor Parameters Example Local Smart Turn (PyTorch) Constructor Parameters Example Local Model Setup CoreML Model & PyTorch Setup How It Works Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_assemblyai_4938c633.txt b/stt_assemblyai_4938c633.txt
new file mode 100644
index 0000000000000000000000000000000000000000..dc22c9f8e6402f4953d73c68d874668812352122
--- /dev/null
+++ b/stt_assemblyai_4938c633.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/assemblyai#interimtranscriptionframe
+Title: AssemblyAI - Pipecat
+==================================================
+
+AssemblyAI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text AssemblyAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AssemblyAISTTService provides real-time speech-to-text capabilities using AssemblyAI’s WebSocket API. It supports streaming transcription with both interim and final results. ​ Installation To use AssemblyAISTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[assemblyai]" You’ll also need to set up your AssemblyAI API key as an environment variable: ASSEMBLYAI_API_KEY . You can obtain a AssemblyAI API key by signing up at AssemblyAI . ​ Configuration ​ Constructor Parameters ​ api_key str required Your AssemblyAI API key. ​ connection_params AssemblyAIConnectionParams Connection parameters for the AssemblyAI WebSocket connection. See below for details. ​ vad_force_turn_endpoint bool default: "True" When true, sends a ForceEndpoint event to AssemblyAI when a UserStoppedSpeakingFrame is received. Requires a VAD (Voice Activity Detection) processor in the pipeline to generate these frames. ​ language Language default: "Language.EN" Language for transcription. AssemblyAI currently only supports English Streaming transcription. ​ api_endpoint_base_url str Base URL for the WebSocket API endpoint. ​ Connection Parameters ​ sample_rate int default: "16000" The sample rate of the audio stream ​ encoding str default: "pcm_s16le" The encoding of the audio stream. Allowed values: pcm_s16le , pcm_mulaw ​ formatted_finals bool default: "True" Whether to return formatted final transcripts. If enabled, formatted final transcripts will be emitted shortly following an end-of-turn detection. ​ word_finalization_max_wait_time int The max amount of time in milliseconds to wait for a word to be finalized. ​ end_of_turn_confidence_threshold float The confidence threshold to use when determining if the end of a turn has been reached. ​ min_end_of_turn_silence_when_confident int The minimum amount of silence required to detect end of turn when confident. ​ max_turn_silence int The maximum amount of silence allowed in a turn before end of turn is triggered. ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Language Support AssemblyAI Streaming STT currently only supports English. ​ Usage Example Copy Ask AI from pipecat.services.assemblyai.stt import AssemblyAISTTService # Configure service stt = AssemblyAISTTService( api_key = os.getenv( "ASSEMBLYAI_API_KEY" ), ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Currently supports English-only real-time transcription Handles WebSocket connection management Provides both interim and final transcriptions Thread-safe processing with proper event loop handling Automatic error handling and reporting Manages connection lifecycle TelnyxFrameSerializer AWS Transcribe On this page Overview Installation Configuration Constructor Parameters Connection Parameters Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_assemblyai_9102e715.txt b/stt_assemblyai_9102e715.txt
new file mode 100644
index 0000000000000000000000000000000000000000..72dfb1e044d20901a9091f927d0f8dcf6f8eb72b
--- /dev/null
+++ b/stt_assemblyai_9102e715.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/assemblyai#output-frames
+Title: AssemblyAI - Pipecat
+==================================================
+
+AssemblyAI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text AssemblyAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AssemblyAISTTService provides real-time speech-to-text capabilities using AssemblyAI’s WebSocket API. It supports streaming transcription with both interim and final results. ​ Installation To use AssemblyAISTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[assemblyai]" You’ll also need to set up your AssemblyAI API key as an environment variable: ASSEMBLYAI_API_KEY . You can obtain a AssemblyAI API key by signing up at AssemblyAI . ​ Configuration ​ Constructor Parameters ​ api_key str required Your AssemblyAI API key. ​ connection_params AssemblyAIConnectionParams Connection parameters for the AssemblyAI WebSocket connection. See below for details. ​ vad_force_turn_endpoint bool default: "True" When true, sends a ForceEndpoint event to AssemblyAI when a UserStoppedSpeakingFrame is received. Requires a VAD (Voice Activity Detection) processor in the pipeline to generate these frames. ​ language Language default: "Language.EN" Language for transcription. AssemblyAI currently only supports English Streaming transcription. ​ api_endpoint_base_url str Base URL for the WebSocket API endpoint. ​ Connection Parameters ​ sample_rate int default: "16000" The sample rate of the audio stream ​ encoding str default: "pcm_s16le" The encoding of the audio stream. Allowed values: pcm_s16le , pcm_mulaw ​ formatted_finals bool default: "True" Whether to return formatted final transcripts. If enabled, formatted final transcripts will be emitted shortly following an end-of-turn detection. ​ word_finalization_max_wait_time int The max amount of time in milliseconds to wait for a word to be finalized. ​ end_of_turn_confidence_threshold float The confidence threshold to use when determining if the end of a turn has been reached. ​ min_end_of_turn_silence_when_confident int The minimum amount of silence required to detect end of turn when confident. ​ max_turn_silence int The maximum amount of silence allowed in a turn before end of turn is triggered. ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Language Support AssemblyAI Streaming STT currently only supports English. ​ Usage Example Copy Ask AI from pipecat.services.assemblyai.stt import AssemblyAISTTService # Configure service stt = AssemblyAISTTService( api_key = os.getenv( "ASSEMBLYAI_API_KEY" ), ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Currently supports English-only real-time transcription Handles WebSocket connection management Provides both interim and final transcriptions Thread-safe processing with proper event loop handling Automatic error handling and reporting Manages connection lifecycle TelnyxFrameSerializer AWS Transcribe On this page Overview Installation Configuration Constructor Parameters Connection Parameters Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_aws_48186520.txt b/stt_aws_48186520.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e5e43dc53cf4192764fa61b018034242a77603c6
--- /dev/null
+++ b/stt_aws_48186520.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/aws#constructor-parameters
+Title: AWS Transcribe - Pipecat
+==================================================
+
+AWS Transcribe - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text AWS Transcribe Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AWSTranscribeSTTService provides real-time speech-to-text capabilities using Amazon Transcribe’s WebSocket API. It supports interim results, adjustable quality levels, and can handle continuous audio streams. ​ Installation To use AWSTranscribeSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) You can obtain AWS credentials by setting up an IAM user with access to Amazon Transcribe in your AWS account. ​ Configuration ​ Constructor Parameters ​ api_key str Your AWS secret access key (can also use environment variable) ​ aws_access_key_id str Your AWS access key ID (can also use environment variable) ​ aws_session_token str Your AWS session token for temporary credentials (can also use environment variable) ​ region str default: "us-east-1" AWS region to use for Transcribe service ​ sample_rate int default: "16000" Audio sample rate in Hz (only 8000 Hz or 16000 Hz are supported) ​ language Language default: "Language.EN" Language for transcription ​ Default Settings Copy Ask AI { "sample_rate" : 16000 , "language" : Language. EN , "media_encoding" : "linear16" , # AWS expects raw PCM "number_of_channels" : 1 , "show_speaker_label" : False , "enable_channel_identification" : False } ​ Input The service processes InputAudioRawFrame instances containing: Raw PCM audio data 16-bit depth 8kHz or 16kHz sample rate (will convert to 16kHz if another rate is provided) Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Usage Example Copy Ask AI from pipecat.services.aws.stt import AWSTranscribeSTTService # Configure service using environment variables for credentials stt = AWSTranscribeSTTService( region = "us-west-2" , sample_rate = 16000 , language = Language. EN ) # Or provide credentials directly stt = AWSTranscribeSTTService( aws_access_key_id = "YOUR_ACCESS_KEY_ID" , api_key = "YOUR_SECRET_ACCESS_KEY" , region = "us-west-2" , sample_rate = 16000 , language = Language. EN ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support AWS Transcribe STT supports the following languages: Language Code Description Service Codes Language.EN English (US) en-US Language.ES Spanish es-US Language.FR French fr-FR Language.DE German de-DE Language.IT Italian it-IT Language.PT Portuguese (Brazil) pt-BR Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.ZH Chinese (Mandarin) zh-CN AWS Transcribe supports additional languages and regional variants. See the AWS Transcribe documentation for a complete list. ​ Frame Flow ​ Metrics Support The service supports the following metrics: Time to First Byte (TTFB) Processing duration ​ Notes Requires valid AWS credentials with access to Amazon Transcribe Supports real-time transcription with interim results Handles WebSocket connection management and reconnection Only supports mono audio (single channel) Automatically handles audio format conversion to PCM Manages connection lifecycle (start, stop, cancel) AssemblyAI Azure On this page Overview Installation Configuration Constructor Parameters Default Settings Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Usage Example Language Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_aws_a9d7b5c0.txt b/stt_aws_a9d7b5c0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9df18c22effa67e576a3dcf238e2cc6d2f24c41f
--- /dev/null
+++ b/stt_aws_a9d7b5c0.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/aws#param-region
+Title: AWS Transcribe - Pipecat
+==================================================
+
+AWS Transcribe - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text AWS Transcribe Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AWSTranscribeSTTService provides real-time speech-to-text capabilities using Amazon Transcribe’s WebSocket API. It supports interim results, adjustable quality levels, and can handle continuous audio streams. ​ Installation To use AWSTranscribeSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) You can obtain AWS credentials by setting up an IAM user with access to Amazon Transcribe in your AWS account. ​ Configuration ​ Constructor Parameters ​ api_key str Your AWS secret access key (can also use environment variable) ​ aws_access_key_id str Your AWS access key ID (can also use environment variable) ​ aws_session_token str Your AWS session token for temporary credentials (can also use environment variable) ​ region str default: "us-east-1" AWS region to use for Transcribe service ​ sample_rate int default: "16000" Audio sample rate in Hz (only 8000 Hz or 16000 Hz are supported) ​ language Language default: "Language.EN" Language for transcription ​ Default Settings Copy Ask AI { "sample_rate" : 16000 , "language" : Language. EN , "media_encoding" : "linear16" , # AWS expects raw PCM "number_of_channels" : 1 , "show_speaker_label" : False , "enable_channel_identification" : False } ​ Input The service processes InputAudioRawFrame instances containing: Raw PCM audio data 16-bit depth 8kHz or 16kHz sample rate (will convert to 16kHz if another rate is provided) Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Usage Example Copy Ask AI from pipecat.services.aws.stt import AWSTranscribeSTTService # Configure service using environment variables for credentials stt = AWSTranscribeSTTService( region = "us-west-2" , sample_rate = 16000 , language = Language. EN ) # Or provide credentials directly stt = AWSTranscribeSTTService( aws_access_key_id = "YOUR_ACCESS_KEY_ID" , api_key = "YOUR_SECRET_ACCESS_KEY" , region = "us-west-2" , sample_rate = 16000 , language = Language. EN ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support AWS Transcribe STT supports the following languages: Language Code Description Service Codes Language.EN English (US) en-US Language.ES Spanish es-US Language.FR French fr-FR Language.DE German de-DE Language.IT Italian it-IT Language.PT Portuguese (Brazil) pt-BR Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.ZH Chinese (Mandarin) zh-CN AWS Transcribe supports additional languages and regional variants. See the AWS Transcribe documentation for a complete list. ​ Frame Flow ​ Metrics Support The service supports the following metrics: Time to First Byte (TTFB) Processing duration ​ Notes Requires valid AWS credentials with access to Amazon Transcribe Supports real-time transcription with interim results Handles WebSocket connection management and reconnection Only supports mono audio (single channel) Automatically handles audio format conversion to PCM Manages connection lifecycle (start, stop, cancel) AssemblyAI Azure On this page Overview Installation Configuration Constructor Parameters Default Settings Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Usage Example Language Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_azure_6d8eda02.txt b/stt_azure_6d8eda02.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d6aafc464d624e3deb35037b92a97fd010fcd815
--- /dev/null
+++ b/stt_azure_6d8eda02.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/azure#installation
+Title: Azure - Pipecat
+==================================================
+
+Azure - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Azure Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK. It supports continuous recognition and multiple languages. ​ Installation To use AzureSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[azure]" You’ll also need to set up the following environment variables: AZURE_API_KEY AZURE_REGION ​ Configuration ​ Constructor Parameters ​ api_key str required Azure Speech Service API key ​ region str required Azure region identifier ​ language Language default: "Language.EN_US" Recognition language ​ sample_rate int default: "None" Input audio sample rate in Hz ​ channels int default: "1" Number of audio channels ​ Input The service processes audio data through a PushAudioInputStream : PCM format Configurable sample rate Mono or stereo input ​ Output Frames ​ TranscriptionFrame Frame Contains: - Recognized text - Empty user ID - ISO 8601 formatted timestamp ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Azure STT supports the following languages and regional variants: Language Code Description Service Codes Language.ZH Chinese zh-CN Language.EN_US English (US) en-US Language.EN_IN English (India) en-IN Language.FR French fr-FR Language.DE German de-DE Language.HI Hindi hi-IN Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.PT_BR Portuguese (Brazil) pt-BR Language.ES Spanish es-ES , es-MX ​ Usage Example Copy Ask AI # Configure service stt = AzureSTTService( api_key = "your-api-key" , region = "eastus" , language = Language. EN_US , sample_rate = 16000 , channels = 1 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Notes Supports continuous recognition Handles automatic reconnection Provides real-time transcription Thread-safe processing Automatic resource cleanup AWS Transcribe Cartesia On this page Overview Installation Configuration Constructor Parameters Input Output Frames Methods Language Setting Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_azure_6ddb26c2.txt b/stt_azure_6ddb26c2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9e747739f02d6ec107c26b86e2f9082b8f57d80e
--- /dev/null
+++ b/stt_azure_6ddb26c2.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/azure#param-region
+Title: Azure - Pipecat
+==================================================
+
+Azure - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Azure Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK. It supports continuous recognition and multiple languages. ​ Installation To use AzureSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[azure]" You’ll also need to set up the following environment variables: AZURE_API_KEY AZURE_REGION ​ Configuration ​ Constructor Parameters ​ api_key str required Azure Speech Service API key ​ region str required Azure region identifier ​ language Language default: "Language.EN_US" Recognition language ​ sample_rate int default: "None" Input audio sample rate in Hz ​ channels int default: "1" Number of audio channels ​ Input The service processes audio data through a PushAudioInputStream : PCM format Configurable sample rate Mono or stereo input ​ Output Frames ​ TranscriptionFrame Frame Contains: - Recognized text - Empty user ID - ISO 8601 formatted timestamp ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Azure STT supports the following languages and regional variants: Language Code Description Service Codes Language.ZH Chinese zh-CN Language.EN_US English (US) en-US Language.EN_IN English (India) en-IN Language.FR French fr-FR Language.DE German de-DE Language.HI Hindi hi-IN Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.PT_BR Portuguese (Brazil) pt-BR Language.ES Spanish es-ES , es-MX ​ Usage Example Copy Ask AI # Configure service stt = AzureSTTService( api_key = "your-api-key" , region = "eastus" , language = Language. EN_US , sample_rate = 16000 , channels = 1 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Notes Supports continuous recognition Handles automatic reconnection Provides real-time transcription Thread-safe processing Automatic resource cleanup AWS Transcribe Cartesia On this page Overview Installation Configuration Constructor Parameters Input Output Frames Methods Language Setting Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_azure_f6d495f4.txt b/stt_azure_f6d495f4.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f889f4d29bccdf31f42c08a611cae02b34d90130
--- /dev/null
+++ b/stt_azure_f6d495f4.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/azure
+Title: Azure - Pipecat
+==================================================
+
+Azure - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Azure Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AzureSTTService provides real-time speech recognition using Azure’s Cognitive Services Speech SDK. It supports continuous recognition and multiple languages. ​ Installation To use AzureSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[azure]" You’ll also need to set up the following environment variables: AZURE_API_KEY AZURE_REGION ​ Configuration ​ Constructor Parameters ​ api_key str required Azure Speech Service API key ​ region str required Azure region identifier ​ language Language default: "Language.EN_US" Recognition language ​ sample_rate int default: "None" Input audio sample rate in Hz ​ channels int default: "1" Number of audio channels ​ Input The service processes audio data through a PushAudioInputStream : PCM format Configurable sample rate Mono or stereo input ​ Output Frames ​ TranscriptionFrame Frame Contains: - Recognized text - Empty user ID - ISO 8601 formatted timestamp ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Azure STT supports the following languages and regional variants: Language Code Description Service Codes Language.ZH Chinese zh-CN Language.EN_US English (US) en-US Language.EN_IN English (India) en-IN Language.FR French fr-FR Language.DE German de-DE Language.HI Hindi hi-IN Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.PT_BR Portuguese (Brazil) pt-BR Language.ES Spanish es-ES , es-MX ​ Usage Example Copy Ask AI # Configure service stt = AzureSTTService( api_key = "your-api-key" , region = "eastus" , language = Language. EN_US , sample_rate = 16000 , channels = 1 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Notes Supports continuous recognition Handles automatic reconnection Provides real-time transcription Thread-safe processing Automatic resource cleanup AWS Transcribe Cartesia On this page Overview Installation Configuration Constructor Parameters Input Output Frames Methods Language Setting Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_cartesia_3010ecf0.txt b/stt_cartesia_3010ecf0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5e852a24779ec9fed6a0f86b8b6f847711d07ff8
--- /dev/null
+++ b/stt_cartesia_3010ecf0.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/cartesia#connection-management
+Title: Cartesia - Pipecat
+==================================================
+
+Cartesia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Cartesia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview CartesiaSTTService provides real-time speech-to-text capabilities using Cartesia’s WebSocket API. It supports streaming transcription with both interim and final results using the ink-whisper model. ​ Installation To use CartesiaSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[cartesia]" You’ll also need to set up your Cartesia API key as an environment variable: CARTESIA_API_KEY . You can obtain a Cartesia API key by signing up at Cartesia . ​ Configuration ​ Constructor Parameters ​ api_key str required Your Cartesia API key ​ base_url str default: "api.cartesia.ai" Custom Cartesia API endpoint URL ​ sample_rate int default: "16000" Audio sample rate in Hz ​ live_options CartesiaLiveOptions Custom transcription options ​ CartesiaLiveOptions ​ model str default: "ink-whisper" The Cartesia transcription model to use ​ language str default: "en" Language code for transcription ​ encoding str default: "pcm_s16le" Audio encoding format ​ sample_rate int default: "16000" Audio sample rate in Hz ​ Default Options Copy Ask AI CartesiaLiveOptions( model = "ink-whisper" , language = "en" , encoding = "pcm_s16le" , sample_rate = 16000 ) ​ Input The service processes raw audio data with the following requirements: PCM audio format ( pcm_s16le ) 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Final transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected or configured language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting The service supports language configuration through the CartesiaLiveOptions : Copy Ask AI live_options = CartesiaLiveOptions( language = "es" ) ​ Model Selection Copy Ask AI live_options = CartesiaLiveOptions( model = "ink-whisper" ) ​ Usage Example Copy Ask AI from pipecat.services.cartesia.stt import CartesiaSTTService, CartesiaLiveOptions from pipecat.transcriptions.language import Language # Basic configuration stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ) ) # Advanced configuration live_options = CartesiaLiveOptions( model = "ink-whisper" , language = Language. ES .value, sample_rate = 16000 , encoding = "pcm_s16le" ) stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ), live_options = live_options ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Connection Management The service automatically manages WebSocket connections: Auto-reconnect : Reconnects automatically when the connection is closed due to timeout Finalization : Sends a “finalize” command when user stops speaking to flush the transcription session Error handling : Gracefully handles connection errors and WebSocket exceptions ​ Metrics Support The service supports comprehensive metrics collection: Time to First Byte (TTFB) Processing duration Speech detection events Connection status ​ Notes Requires valid Cartesia API key Supports real-time streaming transcription Handles automatic WebSocket connection management Includes comprehensive error handling Manages connection lifecycle automatically Azure Deepgram On this page Overview Installation Configuration Constructor Parameters CartesiaLiveOptions Default Options Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Model Selection Usage Example Frame Flow Connection Management Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_cartesia_983bfbf8.txt b/stt_cartesia_983bfbf8.txt
new file mode 100644
index 0000000000000000000000000000000000000000..79eee60f6a54ef2645d604523e7390250f87851f
--- /dev/null
+++ b/stt_cartesia_983bfbf8.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/cartesia#input
+Title: Cartesia - Pipecat
+==================================================
+
+Cartesia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Cartesia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview CartesiaSTTService provides real-time speech-to-text capabilities using Cartesia’s WebSocket API. It supports streaming transcription with both interim and final results using the ink-whisper model. ​ Installation To use CartesiaSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[cartesia]" You’ll also need to set up your Cartesia API key as an environment variable: CARTESIA_API_KEY . You can obtain a Cartesia API key by signing up at Cartesia . ​ Configuration ​ Constructor Parameters ​ api_key str required Your Cartesia API key ​ base_url str default: "api.cartesia.ai" Custom Cartesia API endpoint URL ​ sample_rate int default: "16000" Audio sample rate in Hz ​ live_options CartesiaLiveOptions Custom transcription options ​ CartesiaLiveOptions ​ model str default: "ink-whisper" The Cartesia transcription model to use ​ language str default: "en" Language code for transcription ​ encoding str default: "pcm_s16le" Audio encoding format ​ sample_rate int default: "16000" Audio sample rate in Hz ​ Default Options Copy Ask AI CartesiaLiveOptions( model = "ink-whisper" , language = "en" , encoding = "pcm_s16le" , sample_rate = 16000 ) ​ Input The service processes raw audio data with the following requirements: PCM audio format ( pcm_s16le ) 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Final transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected or configured language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting The service supports language configuration through the CartesiaLiveOptions : Copy Ask AI live_options = CartesiaLiveOptions( language = "es" ) ​ Model Selection Copy Ask AI live_options = CartesiaLiveOptions( model = "ink-whisper" ) ​ Usage Example Copy Ask AI from pipecat.services.cartesia.stt import CartesiaSTTService, CartesiaLiveOptions from pipecat.transcriptions.language import Language # Basic configuration stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ) ) # Advanced configuration live_options = CartesiaLiveOptions( model = "ink-whisper" , language = Language. ES .value, sample_rate = 16000 , encoding = "pcm_s16le" ) stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ), live_options = live_options ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Connection Management The service automatically manages WebSocket connections: Auto-reconnect : Reconnects automatically when the connection is closed due to timeout Finalization : Sends a “finalize” command when user stops speaking to flush the transcription session Error handling : Gracefully handles connection errors and WebSocket exceptions ​ Metrics Support The service supports comprehensive metrics collection: Time to First Byte (TTFB) Processing duration Speech detection events Connection status ​ Notes Requires valid Cartesia API key Supports real-time streaming transcription Handles automatic WebSocket connection management Includes comprehensive error handling Manages connection lifecycle automatically Azure Deepgram On this page Overview Installation Configuration Constructor Parameters CartesiaLiveOptions Default Options Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Model Selection Usage Example Frame Flow Connection Management Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_cartesia_ac4e9d1c.txt b/stt_cartesia_ac4e9d1c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..786be6702d0c8dd61335a746fa6c4f87557d7268
--- /dev/null
+++ b/stt_cartesia_ac4e9d1c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/cartesia#metrics-support
+Title: Cartesia - Pipecat
+==================================================
+
+Cartesia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Cartesia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview CartesiaSTTService provides real-time speech-to-text capabilities using Cartesia’s WebSocket API. It supports streaming transcription with both interim and final results using the ink-whisper model. ​ Installation To use CartesiaSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[cartesia]" You’ll also need to set up your Cartesia API key as an environment variable: CARTESIA_API_KEY . You can obtain a Cartesia API key by signing up at Cartesia . ​ Configuration ​ Constructor Parameters ​ api_key str required Your Cartesia API key ​ base_url str default: "api.cartesia.ai" Custom Cartesia API endpoint URL ​ sample_rate int default: "16000" Audio sample rate in Hz ​ live_options CartesiaLiveOptions Custom transcription options ​ CartesiaLiveOptions ​ model str default: "ink-whisper" The Cartesia transcription model to use ​ language str default: "en" Language code for transcription ​ encoding str default: "pcm_s16le" Audio encoding format ​ sample_rate int default: "16000" Audio sample rate in Hz ​ Default Options Copy Ask AI CartesiaLiveOptions( model = "ink-whisper" , language = "en" , encoding = "pcm_s16le" , sample_rate = 16000 ) ​ Input The service processes raw audio data with the following requirements: PCM audio format ( pcm_s16le ) 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Final transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected or configured language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting The service supports language configuration through the CartesiaLiveOptions : Copy Ask AI live_options = CartesiaLiveOptions( language = "es" ) ​ Model Selection Copy Ask AI live_options = CartesiaLiveOptions( model = "ink-whisper" ) ​ Usage Example Copy Ask AI from pipecat.services.cartesia.stt import CartesiaSTTService, CartesiaLiveOptions from pipecat.transcriptions.language import Language # Basic configuration stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ) ) # Advanced configuration live_options = CartesiaLiveOptions( model = "ink-whisper" , language = Language. ES .value, sample_rate = 16000 , encoding = "pcm_s16le" ) stt = CartesiaSTTService( api_key = os.getenv( "CARTESIA_API_KEY" ), live_options = live_options ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Connection Management The service automatically manages WebSocket connections: Auto-reconnect : Reconnects automatically when the connection is closed due to timeout Finalization : Sends a “finalize” command when user stops speaking to flush the transcription session Error handling : Gracefully handles connection errors and WebSocket exceptions ​ Metrics Support The service supports comprehensive metrics collection: Time to First Byte (TTFB) Processing duration Speech detection events Connection status ​ Notes Requires valid Cartesia API key Supports real-time streaming transcription Handles automatic WebSocket connection management Includes comprehensive error handling Manages connection lifecycle automatically Azure Deepgram On this page Overview Installation Configuration Constructor Parameters CartesiaLiveOptions Default Options Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Language Setting Model Selection Usage Example Frame Flow Connection Management Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_fal_13813fbb.txt b/stt_fal_13813fbb.txt
new file mode 100644
index 0000000000000000000000000000000000000000..64bdf5b8f27eabd8992ebd78c2434757b9fb40ec
--- /dev/null
+++ b/stt_fal_13813fbb.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/fal#param-language
+Title: Fal (Wizper) - Pipecat
+==================================================
+
+Fal (Wizper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Fal (Wizper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview FalSTTService provides speech-to-text capabilities using Fal’s Wizper API. It offers high-quality transcription with minimal setup. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. ​ Installation To use FalSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[fal]" You’ll need to set up your Fal API key as an environment variable: FAL_KEY . You can obtain a Fal API key from the Fal platform . ​ Configuration ​ Constructor Parameters ​ api_key str Your Fal API key. If not provided, will use the FAL_KEY environment variable. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ params InputParams Configuration parameters for the Wizper API. See InputParams below. ​ InputParams ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ task str default: "transcribe" Task to perform. Options are ‘transcribe’ or ‘translate’. ​ chunk_level str default: "segment" Level of chunking for the audio. Default is ‘segment’. ​ version str default: "3" Version of Wizper model to use. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "wizper-v3" ) See the STT base class methods for additional functionality. ​ Language Support Fal Wizper supports a wide range of languages. The service automatically maps Language enum values to the appropriate Wizper language codes. Language Code Description Wizper Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JW Javanese jw Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY Burmese my Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian Nynorsk nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh Fal Wizper supports a range of languages and dialects. For the most accurate transcription, specify the correct language for your audio input. ​ Usage Example Copy Ask AI from pipecat.services.fal.stt import FalSTTService from pipecat.transcriptions.language import Language # Configure service stt = FalSTTService( api_key = "your-fal-api-key" , params = FalSTTService.InputParams( language = Language. EN , ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Processing duration API response time Success/failure rates ​ Notes Requires valid Fal API key Uses Fal’s Wizper model Requires VAD component in transport Processes complete utterances, not continuous audio Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors API availability issues Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. Deepgram Gladia On this page Overview Installation Configuration Constructor Parameters InputParams Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_fal_885d15b9.txt b/stt_fal_885d15b9.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9a7e0e1e706cc4213025474d0a8e308b4c937288
--- /dev/null
+++ b/stt_fal_885d15b9.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/fal#notes
+Title: Fal (Wizper) - Pipecat
+==================================================
+
+Fal (Wizper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Fal (Wizper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview FalSTTService provides speech-to-text capabilities using Fal’s Wizper API. It offers high-quality transcription with minimal setup. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. ​ Installation To use FalSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[fal]" You’ll need to set up your Fal API key as an environment variable: FAL_KEY . You can obtain a Fal API key from the Fal platform . ​ Configuration ​ Constructor Parameters ​ api_key str Your Fal API key. If not provided, will use the FAL_KEY environment variable. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ params InputParams Configuration parameters for the Wizper API. See InputParams below. ​ InputParams ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ task str default: "transcribe" Task to perform. Options are ‘transcribe’ or ‘translate’. ​ chunk_level str default: "segment" Level of chunking for the audio. Default is ‘segment’. ​ version str default: "3" Version of Wizper model to use. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "wizper-v3" ) See the STT base class methods for additional functionality. ​ Language Support Fal Wizper supports a wide range of languages. The service automatically maps Language enum values to the appropriate Wizper language codes. Language Code Description Wizper Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JW Javanese jw Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY Burmese my Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian Nynorsk nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh Fal Wizper supports a range of languages and dialects. For the most accurate transcription, specify the correct language for your audio input. ​ Usage Example Copy Ask AI from pipecat.services.fal.stt import FalSTTService from pipecat.transcriptions.language import Language # Configure service stt = FalSTTService( api_key = "your-fal-api-key" , params = FalSTTService.InputParams( language = Language. EN , ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Processing duration API response time Success/failure rates ​ Notes Requires valid Fal API key Uses Fal’s Wizper model Requires VAD component in transport Processes complete utterances, not continuous audio Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors API availability issues Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. Deepgram Gladia On this page Overview Installation Configuration Constructor Parameters InputParams Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_fal_fcfdb763.txt b/stt_fal_fcfdb763.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a1f82889a101d6fda1c80f9d1e3e59963f5751df
--- /dev/null
+++ b/stt_fal_fcfdb763.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/fal#configuration
+Title: Fal (Wizper) - Pipecat
+==================================================
+
+Fal (Wizper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Fal (Wizper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview FalSTTService provides speech-to-text capabilities using Fal’s Wizper API. It offers high-quality transcription with minimal setup. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. ​ Installation To use FalSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[fal]" You’ll need to set up your Fal API key as an environment variable: FAL_KEY . You can obtain a Fal API key from the Fal platform . ​ Configuration ​ Constructor Parameters ​ api_key str Your Fal API key. If not provided, will use the FAL_KEY environment variable. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ params InputParams Configuration parameters for the Wizper API. See InputParams below. ​ InputParams ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ task str default: "transcribe" Task to perform. Options are ‘transcribe’ or ‘translate’. ​ chunk_level str default: "segment" Level of chunking for the audio. Default is ‘segment’. ​ version str default: "3" Version of Wizper model to use. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "wizper-v3" ) See the STT base class methods for additional functionality. ​ Language Support Fal Wizper supports a wide range of languages. The service automatically maps Language enum values to the appropriate Wizper language codes. Language Code Description Wizper Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JW Javanese jw Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY Burmese my Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian Nynorsk nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh Fal Wizper supports a range of languages and dialects. For the most accurate transcription, specify the correct language for your audio input. ​ Usage Example Copy Ask AI from pipecat.services.fal.stt import FalSTTService from pipecat.transcriptions.language import Language # Configure service stt = FalSTTService( api_key = "your-fal-api-key" , params = FalSTTService.InputParams( language = Language. EN , ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Processing duration API response time Success/failure rates ​ Notes Requires valid Fal API key Uses Fal’s Wizper model Requires VAD component in transport Processes complete utterances, not continuous audio Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors API availability issues Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. Deepgram Gladia On this page Overview Installation Configuration Constructor Parameters InputParams Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_gladia_42ef5bfc.txt b/stt_gladia_42ef5bfc.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5df5d179d7effa7b95a9cb46e72a6bd35924e789
--- /dev/null
+++ b/stt_gladia_42ef5bfc.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/gladia#translationconfig
+Title: Gladia - Pipecat
+==================================================
+
+Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. ​ Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . ​ Configuration ​ Service Parameters ​ api_key string required Your Gladia API key for authentication ​ url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL ​ confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. ​ sample_rate integer default: "None" Audio sample rate in Hz ​ model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. ​ params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service ​ GladiaInputParams ​ encoding string default: "wav/pcm" Audio encoding format ​ bit_depth integer default: "16" Audio bit depth ​ channels integer default: "1" Number of audio channels ​ custom_metadata Dict[str, Any] Additional metadata to include with requests ​ endpointing float Silence duration in seconds to mark end of speech ​ maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence ​ language Language deprecated Primary language for transcription. Deprecated: use language_config instead. ​ language_config LanguageConfig Detailed language configuration ​ pre_processing PreProcessingConfig Audio pre-processing options ​ realtime_processing RealtimeProcessingConfig Real-time processing features ​ messages_config MessagesConfig WebSocket message filtering options ​ LanguageConfig ​ languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. ​ code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. ​ PreProcessingConfig ​ speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. ​ CustomVocabularyConfig ​ vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. ​ default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. ​ CustomSpellingConfig ​ spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. ​ TranslationConfig ​ target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) ​ model string default: "base" Translation model to use. Options: “base” or “enhanced” ​ match_original_utterances boolean default: "true" Align translated utterances with the original ones ​ RealtimeProcessingConfig ​ words_accurate_timestamps boolean Whether to provide per-word timestamps ​ custom_vocabulary boolean Whether to enable custom vocabulary ​ custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration ​ custom_spelling boolean Whether to enable custom spelling ​ custom_spelling_config CustomSpellingConfig Custom spelling configuration ​ translation boolean Whether to enable translation ​ translation_config TranslationConfig Translation configuration ​ named_entity_recognition boolean Whether to enable named entity recognition ​ sentiment_analysis boolean Whether to enable sentiment analysis ​ MessagesConfig ​ receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket ​ receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket ​ receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket ​ receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket ​ receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket ​ receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket ​ receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket ​ receive_errors boolean default: "true" If true, errors will be sent via WebSocket ​ receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . ​ Advanced Features ​ Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) ​ Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) ​ Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_gladia_552fe4eb.txt b/stt_gladia_552fe4eb.txt
new file mode 100644
index 0000000000000000000000000000000000000000..17b563d9f461e40ea39ceef1f0d34256d08accb4
--- /dev/null
+++ b/stt_gladia_552fe4eb.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/gladia#custom-vocabulary
+Title: Gladia - Pipecat
+==================================================
+
+Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. ​ Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . ​ Configuration ​ Service Parameters ​ api_key string required Your Gladia API key for authentication ​ url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL ​ confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. ​ sample_rate integer default: "None" Audio sample rate in Hz ​ model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. ​ params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service ​ GladiaInputParams ​ encoding string default: "wav/pcm" Audio encoding format ​ bit_depth integer default: "16" Audio bit depth ​ channels integer default: "1" Number of audio channels ​ custom_metadata Dict[str, Any] Additional metadata to include with requests ​ endpointing float Silence duration in seconds to mark end of speech ​ maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence ​ language Language deprecated Primary language for transcription. Deprecated: use language_config instead. ​ language_config LanguageConfig Detailed language configuration ​ pre_processing PreProcessingConfig Audio pre-processing options ​ realtime_processing RealtimeProcessingConfig Real-time processing features ​ messages_config MessagesConfig WebSocket message filtering options ​ LanguageConfig ​ languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. ​ code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. ​ PreProcessingConfig ​ speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. ​ CustomVocabularyConfig ​ vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. ​ default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. ​ CustomSpellingConfig ​ spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. ​ TranslationConfig ​ target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) ​ model string default: "base" Translation model to use. Options: “base” or “enhanced” ​ match_original_utterances boolean default: "true" Align translated utterances with the original ones ​ RealtimeProcessingConfig ​ words_accurate_timestamps boolean Whether to provide per-word timestamps ​ custom_vocabulary boolean Whether to enable custom vocabulary ​ custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration ​ custom_spelling boolean Whether to enable custom spelling ​ custom_spelling_config CustomSpellingConfig Custom spelling configuration ​ translation boolean Whether to enable translation ​ translation_config TranslationConfig Translation configuration ​ named_entity_recognition boolean Whether to enable named entity recognition ​ sentiment_analysis boolean Whether to enable sentiment analysis ​ MessagesConfig ​ receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket ​ receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket ​ receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket ​ receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket ​ receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket ​ receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket ​ receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket ​ receive_errors boolean default: "true" If true, errors will be sent via WebSocket ​ receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . ​ Advanced Features ​ Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) ​ Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) ​ Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_gladia_7d933387.txt b/stt_gladia_7d933387.txt
new file mode 100644
index 0000000000000000000000000000000000000000..272ee53a8246beb855ac872281c6720fc81a9c9e
--- /dev/null
+++ b/stt_gladia_7d933387.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/gladia#param-language-config
+Title: Gladia - Pipecat
+==================================================
+
+Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. ​ Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . ​ Configuration ​ Service Parameters ​ api_key string required Your Gladia API key for authentication ​ url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL ​ confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. ​ sample_rate integer default: "None" Audio sample rate in Hz ​ model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. ​ params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service ​ GladiaInputParams ​ encoding string default: "wav/pcm" Audio encoding format ​ bit_depth integer default: "16" Audio bit depth ​ channels integer default: "1" Number of audio channels ​ custom_metadata Dict[str, Any] Additional metadata to include with requests ​ endpointing float Silence duration in seconds to mark end of speech ​ maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence ​ language Language deprecated Primary language for transcription. Deprecated: use language_config instead. ​ language_config LanguageConfig Detailed language configuration ​ pre_processing PreProcessingConfig Audio pre-processing options ​ realtime_processing RealtimeProcessingConfig Real-time processing features ​ messages_config MessagesConfig WebSocket message filtering options ​ LanguageConfig ​ languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. ​ code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. ​ PreProcessingConfig ​ speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. ​ CustomVocabularyConfig ​ vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. ​ default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. ​ CustomSpellingConfig ​ spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. ​ TranslationConfig ​ target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) ​ model string default: "base" Translation model to use. Options: “base” or “enhanced” ​ match_original_utterances boolean default: "true" Align translated utterances with the original ones ​ RealtimeProcessingConfig ​ words_accurate_timestamps boolean Whether to provide per-word timestamps ​ custom_vocabulary boolean Whether to enable custom vocabulary ​ custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration ​ custom_spelling boolean Whether to enable custom spelling ​ custom_spelling_config CustomSpellingConfig Custom spelling configuration ​ translation boolean Whether to enable translation ​ translation_config TranslationConfig Translation configuration ​ named_entity_recognition boolean Whether to enable named entity recognition ​ sentiment_analysis boolean Whether to enable sentiment analysis ​ MessagesConfig ​ receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket ​ receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket ​ receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket ​ receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket ​ receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket ​ receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket ​ receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket ​ receive_errors boolean default: "true" If true, errors will be sent via WebSocket ​ receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . ​ Advanced Features ​ Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) ​ Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) ​ Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_gladia_87c5d673.txt b/stt_gladia_87c5d673.txt
new file mode 100644
index 0000000000000000000000000000000000000000..410018351433eda026234cd6e871dfc8e93a85f2
--- /dev/null
+++ b/stt_gladia_87c5d673.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/gladia#param-translation-config
+Title: Gladia - Pipecat
+==================================================
+
+Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. ​ Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . ​ Configuration ​ Service Parameters ​ api_key string required Your Gladia API key for authentication ​ url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL ​ confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. ​ sample_rate integer default: "None" Audio sample rate in Hz ​ model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. ​ params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service ​ GladiaInputParams ​ encoding string default: "wav/pcm" Audio encoding format ​ bit_depth integer default: "16" Audio bit depth ​ channels integer default: "1" Number of audio channels ​ custom_metadata Dict[str, Any] Additional metadata to include with requests ​ endpointing float Silence duration in seconds to mark end of speech ​ maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence ​ language Language deprecated Primary language for transcription. Deprecated: use language_config instead. ​ language_config LanguageConfig Detailed language configuration ​ pre_processing PreProcessingConfig Audio pre-processing options ​ realtime_processing RealtimeProcessingConfig Real-time processing features ​ messages_config MessagesConfig WebSocket message filtering options ​ LanguageConfig ​ languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. ​ code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. ​ PreProcessingConfig ​ speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. ​ CustomVocabularyConfig ​ vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. ​ default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. ​ CustomSpellingConfig ​ spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. ​ TranslationConfig ​ target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) ​ model string default: "base" Translation model to use. Options: “base” or “enhanced” ​ match_original_utterances boolean default: "true" Align translated utterances with the original ones ​ RealtimeProcessingConfig ​ words_accurate_timestamps boolean Whether to provide per-word timestamps ​ custom_vocabulary boolean Whether to enable custom vocabulary ​ custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration ​ custom_spelling boolean Whether to enable custom spelling ​ custom_spelling_config CustomSpellingConfig Custom spelling configuration ​ translation boolean Whether to enable translation ​ translation_config TranslationConfig Translation configuration ​ named_entity_recognition boolean Whether to enable named entity recognition ​ sentiment_analysis boolean Whether to enable sentiment analysis ​ MessagesConfig ​ receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket ​ receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket ​ receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket ​ receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket ​ receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket ​ receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket ​ receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket ​ receive_errors boolean default: "true" If true, errors will be sent via WebSocket ​ receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . ​ Advanced Features ​ Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) ​ Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) ​ Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_gladia_aa719a2e.txt b/stt_gladia_aa719a2e.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d1e0c2dcee09d802e0da62006d6acd203a8890ad
--- /dev/null
+++ b/stt_gladia_aa719a2e.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/gladia#param-default-intensity
+Title: Gladia - Pipecat
+==================================================
+
+Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. ​ Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . ​ Configuration ​ Service Parameters ​ api_key string required Your Gladia API key for authentication ​ url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL ​ confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. ​ sample_rate integer default: "None" Audio sample rate in Hz ​ model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. ​ params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service ​ GladiaInputParams ​ encoding string default: "wav/pcm" Audio encoding format ​ bit_depth integer default: "16" Audio bit depth ​ channels integer default: "1" Number of audio channels ​ custom_metadata Dict[str, Any] Additional metadata to include with requests ​ endpointing float Silence duration in seconds to mark end of speech ​ maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence ​ language Language deprecated Primary language for transcription. Deprecated: use language_config instead. ​ language_config LanguageConfig Detailed language configuration ​ pre_processing PreProcessingConfig Audio pre-processing options ​ realtime_processing RealtimeProcessingConfig Real-time processing features ​ messages_config MessagesConfig WebSocket message filtering options ​ LanguageConfig ​ languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. ​ code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. ​ PreProcessingConfig ​ speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. ​ CustomVocabularyConfig ​ vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. ​ default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. ​ CustomSpellingConfig ​ spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. ​ TranslationConfig ​ target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) ​ model string default: "base" Translation model to use. Options: “base” or “enhanced” ​ match_original_utterances boolean default: "true" Align translated utterances with the original ones ​ RealtimeProcessingConfig ​ words_accurate_timestamps boolean Whether to provide per-word timestamps ​ custom_vocabulary boolean Whether to enable custom vocabulary ​ custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration ​ custom_spelling boolean Whether to enable custom spelling ​ custom_spelling_config CustomSpellingConfig Custom spelling configuration ​ translation boolean Whether to enable translation ​ translation_config TranslationConfig Translation configuration ​ named_entity_recognition boolean Whether to enable named entity recognition ​ sentiment_analysis boolean Whether to enable sentiment analysis ​ MessagesConfig ​ receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket ​ receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket ​ receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket ​ receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket ​ receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket ​ receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket ​ receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket ​ receive_errors boolean default: "true" If true, errors will be sent via WebSocket ​ receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . ​ Advanced Features ​ Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) ​ Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) ​ Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_gladia_c0be6f92.txt b/stt_gladia_c0be6f92.txt
new file mode 100644
index 0000000000000000000000000000000000000000..667e5605b35d5041d0d50b40215e5b6b92923137
--- /dev/null
+++ b/stt_gladia_c0be6f92.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/gladia#languageconfig
+Title: Gladia - Pipecat
+==================================================
+
+Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. ​ Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . ​ Configuration ​ Service Parameters ​ api_key string required Your Gladia API key for authentication ​ url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL ​ confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. ​ sample_rate integer default: "None" Audio sample rate in Hz ​ model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. ​ params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service ​ GladiaInputParams ​ encoding string default: "wav/pcm" Audio encoding format ​ bit_depth integer default: "16" Audio bit depth ​ channels integer default: "1" Number of audio channels ​ custom_metadata Dict[str, Any] Additional metadata to include with requests ​ endpointing float Silence duration in seconds to mark end of speech ​ maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence ​ language Language deprecated Primary language for transcription. Deprecated: use language_config instead. ​ language_config LanguageConfig Detailed language configuration ​ pre_processing PreProcessingConfig Audio pre-processing options ​ realtime_processing RealtimeProcessingConfig Real-time processing features ​ messages_config MessagesConfig WebSocket message filtering options ​ LanguageConfig ​ languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. ​ code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. ​ PreProcessingConfig ​ speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. ​ CustomVocabularyConfig ​ vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. ​ default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. ​ CustomSpellingConfig ​ spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. ​ TranslationConfig ​ target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) ​ model string default: "base" Translation model to use. Options: “base” or “enhanced” ​ match_original_utterances boolean default: "true" Align translated utterances with the original ones ​ RealtimeProcessingConfig ​ words_accurate_timestamps boolean Whether to provide per-word timestamps ​ custom_vocabulary boolean Whether to enable custom vocabulary ​ custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration ​ custom_spelling boolean Whether to enable custom spelling ​ custom_spelling_config CustomSpellingConfig Custom spelling configuration ​ translation boolean Whether to enable translation ​ translation_config TranslationConfig Translation configuration ​ named_entity_recognition boolean Whether to enable named entity recognition ​ sentiment_analysis boolean Whether to enable sentiment analysis ​ MessagesConfig ​ receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket ​ receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket ​ receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket ​ receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket ​ receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket ​ receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket ​ receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket ​ receive_errors boolean default: "true" If true, errors will be sent via WebSocket ​ receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . ​ Advanced Features ​ Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) ​ Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) ​ Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_gladia_eedfeb79.txt b/stt_gladia_eedfeb79.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f4fe1015c40369dd44c0060e3573730d87ea2d1a
--- /dev/null
+++ b/stt_gladia_eedfeb79.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/gladia#param-language
+Title: Gladia - Pipecat
+==================================================
+
+Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. ​ Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . ​ Configuration ​ Service Parameters ​ api_key string required Your Gladia API key for authentication ​ url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL ​ confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. ​ sample_rate integer default: "None" Audio sample rate in Hz ​ model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. ​ params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service ​ GladiaInputParams ​ encoding string default: "wav/pcm" Audio encoding format ​ bit_depth integer default: "16" Audio bit depth ​ channels integer default: "1" Number of audio channels ​ custom_metadata Dict[str, Any] Additional metadata to include with requests ​ endpointing float Silence duration in seconds to mark end of speech ​ maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence ​ language Language deprecated Primary language for transcription. Deprecated: use language_config instead. ​ language_config LanguageConfig Detailed language configuration ​ pre_processing PreProcessingConfig Audio pre-processing options ​ realtime_processing RealtimeProcessingConfig Real-time processing features ​ messages_config MessagesConfig WebSocket message filtering options ​ LanguageConfig ​ languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. ​ code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. ​ PreProcessingConfig ​ speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. ​ CustomVocabularyConfig ​ vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. ​ default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. ​ CustomSpellingConfig ​ spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. ​ TranslationConfig ​ target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) ​ model string default: "base" Translation model to use. Options: “base” or “enhanced” ​ match_original_utterances boolean default: "true" Align translated utterances with the original ones ​ RealtimeProcessingConfig ​ words_accurate_timestamps boolean Whether to provide per-word timestamps ​ custom_vocabulary boolean Whether to enable custom vocabulary ​ custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration ​ custom_spelling boolean Whether to enable custom spelling ​ custom_spelling_config CustomSpellingConfig Custom spelling configuration ​ translation boolean Whether to enable translation ​ translation_config TranslationConfig Translation configuration ​ named_entity_recognition boolean Whether to enable named entity recognition ​ sentiment_analysis boolean Whether to enable sentiment analysis ​ MessagesConfig ​ receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket ​ receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket ​ receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket ​ receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket ​ receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket ​ receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket ​ receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket ​ receive_errors boolean default: "true" If true, errors will be sent via WebSocket ​ receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . ​ Advanced Features ​ Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) ​ Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) ​ Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_gladia_f4711ac7.txt b/stt_gladia_f4711ac7.txt
new file mode 100644
index 0000000000000000000000000000000000000000..74fdeaf98367e2bf987d49d54c1ecb98d9880d78
--- /dev/null
+++ b/stt_gladia_f4711ac7.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/gladia#translation
+Title: Gladia - Pipecat
+==================================================
+
+Gladia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Gladia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GladiaSTTService is a speech-to-text (STT) service that integrates with Gladia’s API to provide real-time transcription capabilities. It processes audio input and produces transcription frames in real-time with support for multiple languages, custom vocabulary, and various processing options. ​ Installation To use GladiaSTTService , you need to install the Gladia dependencies: Copy Ask AI pip install "pipecat-ai[gladia]" You’ll also need to set up your Gladia API key as an environment variable: GLADIA_API_KEY . ​ Configuration ​ Service Parameters ​ api_key string required Your Gladia API key for authentication ​ url string default: "https://api.gladia.io/v2/live" Gladia API endpoint URL ​ confidence float default: "0.5" Minimum confidence threshold to create interim and final transcriptions. Values range from 0 to 1. ​ sample_rate integer default: "None" Audio sample rate in Hz ​ model string default: "solaria-1" Model to use for transcription. Options include solaria-1 solaria-mini-1 fast accurate See Gladia’s docs for the latest supported models. ​ params GladiaInputParams default: "GladiaInputParams()" Additional configuration parameters for the service ​ GladiaInputParams ​ encoding string default: "wav/pcm" Audio encoding format ​ bit_depth integer default: "16" Audio bit depth ​ channels integer default: "1" Number of audio channels ​ custom_metadata Dict[str, Any] Additional metadata to include with requests ​ endpointing float Silence duration in seconds to mark end of speech ​ maximum_duration_without_endpointing integer default: "10" Maximum utterance duration without silence ​ language Language deprecated Primary language for transcription. Deprecated: use language_config instead. ​ language_config LanguageConfig Detailed language configuration ​ pre_processing PreProcessingConfig Audio pre-processing options ​ realtime_processing RealtimeProcessingConfig Real-time processing features ​ messages_config MessagesConfig WebSocket message filtering options ​ LanguageConfig ​ languages List[str] Specify language(s) for transcription. If one language is set, it will be used for all transcription. If multiple languages are provided or none, language will be auto-detected by the model. ​ code_switching boolean default: "false" If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored. ​ PreProcessingConfig ​ speech_threshold float default: "0.8" Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech. Must be between 0 and 1. ​ CustomVocabularyConfig ​ vocabulary List[Union[str, CustomVocabularyItem]] required Specific vocabulary list to feed the transcription model with. Can be a list of strings or CustomVocabularyItem objects. ​ default_intensity float Default intensity for the custom vocabulary. Must be between 0 and 1. ​ CustomSpellingConfig ​ spelling_dictionary Dict[str, List[str]] required The list of spelling rules applied on the audio transcription. Keys are the correct spellings and values are lists of phonetic variations. ​ TranslationConfig ​ target_languages List[str] required The target language(s) in ISO639-1 format (e.g., “en”, “fr”, “es”) ​ model string default: "base" Translation model to use. Options: “base” or “enhanced” ​ match_original_utterances boolean default: "true" Align translated utterances with the original ones ​ RealtimeProcessingConfig ​ words_accurate_timestamps boolean Whether to provide per-word timestamps ​ custom_vocabulary boolean Whether to enable custom vocabulary ​ custom_vocabulary_config CustomVocabularyConfig Custom vocabulary configuration ​ custom_spelling boolean Whether to enable custom spelling ​ custom_spelling_config CustomSpellingConfig Custom spelling configuration ​ translation boolean Whether to enable translation ​ translation_config TranslationConfig Translation configuration ​ named_entity_recognition boolean Whether to enable named entity recognition ​ sentiment_analysis boolean Whether to enable sentiment analysis ​ MessagesConfig ​ receive_partial_transcripts boolean default: "true" If true, partial utterances will be sent via WebSocket ​ receive_final_transcripts boolean default: "true" If true, final utterances will be sent via WebSocket ​ receive_speech_events boolean default: "true" If true, begin and end speech events will be sent via WebSocket ​ receive_pre_processing_events boolean default: "true" If true, pre-processing events will be sent via WebSocket ​ receive_realtime_processing_events boolean default: "true" If true, realtime processing events will be sent via WebSocket ​ receive_post_processing_events boolean default: "true" If true, post-processing events will be sent via WebSocket ​ receive_acknowledgments boolean default: "true" If true, acknowledgments will be sent via WebSocket ​ receive_errors boolean default: "true" If true, errors will be sent via WebSocket ​ receive_lifecycle_events boolean default: "false" If true, lifecycle events will be sent via WebSocket ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth 16kHz sample rate (default) Single channel (mono) ​ Output The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Transcription language ​ InterimTranscriptionFrame Generated during ongoing speech, containing the same fields as TranscriptionFrame but with preliminary results. ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods See the STT base class methods for additional functionality. ​ Language Setting Copy Ask AI await service.set_language(Language. FR ) ​ Language Support Gladia STT supports a wide range of languages. Here’s a partial list: Language Code Description Service Code Language.AF Afrikaans af Language.AM Amharic am Language.AR Arabic ar Language.AS Assamese as Language.AZ Azerbaijani az Language.BA Bashkir ba Language.BE Belarusian be Language.BG Bulgarian bg Language.BN Bengali bn Language.BO Tibetan bo Language.BR Breton br Language.BS Bosnian bs Language.CA Catalan ca Language.CS Czech cs Language.CY Welsh cy Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.ET Estonian et Language.EU Basque eu Language.FA Persian fa Language.FI Finnish fi Language.FO Faroese fo Language.FR French fr Language.GL Galician gl Language.GU Gujarati gu Language.HA Hausa ha Language.HAW Hawaiian haw Language.HE Hebrew he Language.HI Hindi hi Language.HR Croatian hr Language.HT Haitian Creole ht Language.HU Hungarian hu Language.HY Armenian hy Language.ID Indonesian id Language.IS Icelandic is Language.IT Italian it Language.JA Japanese ja Language.JV Javanese jv Language.KA Georgian ka Language.KK Kazakh kk Language.KM Khmer km Language.KN Kannada kn Language.KO Korean ko Language.LA Latin la Language.LB Luxembourgish lb Language.LN Lingala ln Language.LO Lao lo Language.LT Lithuanian lt Language.LV Latvian lv Language.MG Malagasy mg Language.MI Maori mi Language.MK Macedonian mk Language.ML Malayalam ml Language.MN Mongolian mn Language.MR Marathi mr Language.MS Malay ms Language.MT Maltese mt Language.MY_MR Burmese mymr Language.NE Nepali ne Language.NL Dutch nl Language.NN Norwegian (Nynorsk) nn Language.NO Norwegian no Language.OC Occitan oc Language.PA Punjabi pa Language.PL Polish pl Language.PS Pashto ps Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SA Sanskrit sa Language.SD Sindhi sd Language.SI Sinhala si Language.SK Slovak sk Language.SL Slovenian sl Language.SN Shona sn Language.SO Somali so Language.SQ Albanian sq Language.SR Serbian sr Language.SU Sundanese su Language.SV Swedish sv Language.SW Swahili sw Language.TA Tamil ta Language.TE Telugu te Language.TG Tajik tg Language.TH Thai th Language.TK Turkmen tk Language.TL Tagalog tl Language.TR Turkish tr Language.TT Tatar tt Language.UK Ukrainian uk Language.UR Urdu ur Language.UZ Uzbek uz Language.VI Vietnamese vi Language.YI Yiddish yi Language.YO Yoruba yo Language.ZH Chinese zh For a complete list of supported languages, refer to Gladia’s documentation . ​ Advanced Features ​ Custom Vocabulary You can provide custom vocabulary items with bias intensity: Copy Ask AI from pipecat.services.gladia.config import CustomVocabularyItem, CustomVocabularyConfig, RealtimeProcessingConfig custom_vocab = CustomVocabularyConfig( vocabulary = [ CustomVocabularyItem( value = "Pipecat" , intensity = 0.8 ), CustomVocabularyItem( value = "Daily" , intensity = 0.7 ), ], default_intensity = 0.5 ) realtime_config = RealtimeProcessingConfig( custom_vocabulary = True , custom_vocabulary_config = custom_vocab ) ​ Translation Enable real-time translation: Copy Ask AI from pipecat.services.gladia.config import TranslationConfig, RealtimeProcessingConfig translation_config = TranslationConfig( target_languages = [ "fr" , "es" , "de" ], model = "enhanced" , match_original_utterances = True ) realtime_config = RealtimeProcessingConfig( translation = True , translation_config = translation_config ) ​ Multi-language Support Configure multiple languages with automatic language switching: Copy Ask AI from pipecat.services.gladia.config import LanguageConfig, GladiaInputParams language_config = LanguageConfig( languages = [ "en" , "fr" , "es" ], code_switching = True ) params = GladiaInputParams( language_config = language_config ) ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.gladia.stt import GladiaSTTService from pipecat.services.gladia.config import ( GladiaInputParams, LanguageConfig, RealtimeProcessingConfig ) from pipecat.transcriptions.language import Language # Configure the service stt = GladiaSTTService( api_key = "your-api-key" , model = "solaria-1" , params = GladiaInputParams( language_config = LanguageConfig( languages = [Language. EN , Language. FR ], code_switching = True ), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Connection status ​ Notes Audio input must be in PCM format Transcription frames are only generated when confidence threshold is met Service automatically handles websocket connections and cleanup Real-time processing occurs in parallel for natural conversation flow Fal (Wizper) Google On this page Overview Installation Configuration Service Parameters GladiaInputParams LanguageConfig PreProcessingConfig CustomVocabularyConfig CustomSpellingConfig TranslationConfig RealtimeProcessingConfig MessagesConfig Input Output TranscriptionFrame InterimTranscriptionFrame ErrorFrame Methods Language Setting Language Support Advanced Features Custom Vocabulary Translation Multi-language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_google_0dbe519a.txt b/stt_google_0dbe519a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..140bd70f1343e26cf2aa2f8db185e815910e2357
--- /dev/null
+++ b/stt_google_0dbe519a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/google#param-languages
+Title: Google - Pipecat
+==================================================
+
+Google - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Google Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GoogleSTTService provides real-time speech-to-text capabilities using Google Cloud’s Speech-to-Text V2 API. It supports interim results, multiple languages, and voice activity detection (VAD). ​ Installation To use GoogleSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need Google Cloud credentials either as a JSON string or file. You can obtain Google Cloud credentials by creating a service account in the Google Cloud Console . ​ Configuration ​ Constructor Parameters ​ credentials str Google Cloud service account credentials as JSON string ​ credentials_path str Path to service account credentials JSON file ​ location str default: "global" Google Cloud location for the service ​ sample_rate int Audio sample rate in Hertz ​ params InputParams Configuration parameters for the service ​ InputParams The InputParams class provides configuration options for the Google STT service. ​ languages Language | List[Language] default: "Language.EN_US" Single language or list of recognition languages. First language is primary. Examples: Language.EN_US [Language.EN_US, Language.ES_US] The first language in the list is considered primary. Recognition accuracy may vary with multiple languages. When using multiple languages, list them in order of expected usage frequency for optimal recognition results. ​ model str default: "latest_long" Speech recognition model to use. ​ use_separate_recognition_per_channel bool default: "False" Process each audio channel separately for multi-channel audio. ​ enable_automatic_punctuation bool default: "True" Automatically add punctuation marks to transcriptions. ​ enable_spoken_punctuation bool default: "False" Include spoken punctuation (e.g., “period”, “comma”) in transcript. ​ enable_spoken_emojis bool default: "False" Include spoken emojis (e.g., “smiley face”) in transcript. ​ profanity_filter bool default: "False" Filter profanity from transcriptions. ​ enable_word_time_offsets bool default: "False" Include timing information for each word. ​ enable_word_confidence bool default: "False" Include confidence scores for each word. ​ enable_interim_results bool default: "True" Stream partial recognition results as they become available. ​ enable_voice_activity_events bool default: "False" Enable voice activity detection events. Not all features are available for all models or languages Some combinations of options may affect latency or accuracy Model selection should match your use case for best results ​ Input The service processes raw audio data with: Linear16 PCM encoding 16-bit depth Configurable sample rate Single channel (mono) ​ Output Frames The service produces two types of frames: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Recognition language ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods ​ set_languages method Updates the service’s recognition language. Copy Ask AI async def set_languages ( language : List[Language]) -> None Example: Copy Ask AI await service.set_languages([Language. FR_FR ]) ​ set_model method Updates the service’s recognition model. Copy Ask AI async def set_model ( model : str ) -> None Example: Copy Ask AI await service.set_model( "medical_dictation" ) ​ update_options method Updates multiple service options dynamically. Copy Ask AI async def update_options ( * , languages : Optional[List[Language]] = None , model : Optional[ str ] = None , enable_automatic_punctuation : Optional[ bool ] = None , enable_spoken_punctuation : Optional[ bool ] = None , enable_spoken_emojis : Optional[ bool ] = None , profanity_filter : Optional[ bool ] = None , enable_word_time_offsets : Optional[ bool ] = None , enable_word_confidence : Optional[ bool ] = None , enable_interim_results : Optional[ bool ] = None , enable_voice_activity_events : Optional[ bool ] = None , location : Optional[ str ] = None , ) -> None Example: Copy Ask AI await service.update_options( languages = [Language. ES_ES , Language. EN_US ], enable_interim_results = True , profanity_filter = True ) See the STT base class methods for additional functionality. ​ Usage Example Copy Ask AI from pipecat.services.google.stt import GoogleSTTService from pipecat.transcriptions.language import Language # Configure service stt = GoogleSTTService( credentials_path = "path/to/credentials.json" , params = GoogleSTTService.InputParams( languages = Language. EN_US , model = "latest_long" , enable_automatic_punctuation = True , enable_interim_results = True ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, ... ]) ​ Regional Support Google Cloud Speech-to-Text V2 supports different regional endpoints for improved latency and data residency requirements. ​ Available Regions See supported languages, models, and features for each region in Google’s Speech-to-Text documentation . ​ Configuration Specify the region during service initialization: Copy Ask AI stt = GoogleSTTService( credentials = credentials, location = "us-central1" , # Use us-central1 endpoint params = GoogleSTTService.InputParams( model = "chirp_2" ) ) ​ Dynamic Region Updates The region can be updated during runtime: Copy Ask AI await stt.update_options( location = "asia" ) ​ Notes The global endpoint is used by default Regional endpoints may provide lower latency for users in those regions Some features or models might only be available in specific regions Regional selection may affect pricing Data residency requirements may dictate region selection ​ Models Model Name Description Best For chirp_2 Google’s latest ASR model General use cases latest_long Latest model optimized for long-form speech Conversations, meetings latest_short Latest model optimized for short-form speech Short messages, notes telephony Optimized for phone calls Call centers medical_dictation Optimized for medical terminology Healthcare dictation medical_conversation Optimized for doctor-patient interactions Medical consultations See Google Cloud’s Speech-to-Text documentation for more details. ​ Language Support Language Code Description Service Codes Language.AF Afrikaans af-ZA Language.SQ Albanian sq-AL Language.AM Amharic am-ET Language.AR Arabic (Default: Egypt) ar-EG Language.AR_AE Arabic (UAE) ar-AE Language.AR_BH Arabic (Bahrain) ar-BH Language.AR_DZ Arabic (Algeria) ar-DZ Language.AR_EG Arabic (Egypt) ar-EG Language.AR_IQ Arabic (Iraq) ar-IQ Language.AR_JO Arabic (Jordan) ar-JO Language.AR_KW Arabic (Kuwait) ar-KW Language.AR_LB Arabic (Lebanon) ar-LB Language.AR_MA Arabic (Morocco) ar-MA Language.AR_OM Arabic (Oman) ar-OM Language.AR_QA Arabic (Qatar) ar-QA Language.AR_SA Arabic (Saudi Arabia) ar-SA Language.AR_SY Arabic (Syria) ar-SY Language.AR_TN Arabic (Tunisia) ar-TN Language.AR_YE Arabic (Yemen) ar-YE Language.HY Armenian hy-AM Language.AZ Azerbaijani az-AZ Language.EU Basque eu-ES Language.BN Bengali (Default: India) bn-IN Language.BN_BD Bengali (Bangladesh) bn-BD Language.BN_IN Bengali (India) bn-IN Language.BS Bosnian bs-BA Language.BG Bulgarian bg-BG Language.MY Burmese my-MM Language.CA Catalan ca-ES Language.ZH Chinese (Default: Simplified) cmn-Hans-CN Language.ZH_CN Chinese (Simplified) cmn-Hans-CN Language.ZH_HK Chinese (Hong Kong) cmn-Hans-HK Language.ZH_TW Chinese (Traditional) cmn-Hant-TW Language.YUE Chinese (Cantonese) yue-Hant-HK Language.HR Croatian hr-HR Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch (Default: Netherlands) nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.NL_NL Dutch (Netherlands) nl-NL Language.EN English (Default: US) en-US Language.EN_AU English (Australia) en-AU Language.EN_CA English (Canada) en-CA Language.EN_GB English (UK) en-GB Language.EN_GH English (Ghana) en-GH Language.EN_HK English (Hong Kong) en-HK Language.EN_IN English (India) en-IN Language.EN_IE English (Ireland) en-IE Language.EN_KE English (Kenya) en-KE Language.EN_NG English (Nigeria) en-NG Language.EN_NZ English (New Zealand) en-NZ Language.EN_PH English (Philippines) en-PH Language.EN_SG English (Singapore) en-SG Language.EN_TZ English (Tanzania) en-TZ Language.EN_US English (US) en-US Language.EN_ZA English (South Africa) en-ZA Language.ET Estonian et-EE Language.FIL Filipino fil-PH Language.FI Finnish fi-FI Language.FR French (Default: France) fr-FR Language.FR_BE French (Belgium) fr-BE Language.FR_CA French (Canada) fr-CA Language.FR_CH French (Switzerland) fr-CH Language.GL Galician gl-ES Language.KA Georgian ka-GE Language.DE German (Default: Germany) de-DE Language.DE_AT German (Austria) de-AT Language.DE_CH German (Switzerland) de-CH Language.EL Greek el-GR Language.GU Gujarati gu-IN Language.HE Hebrew iw-IL Language.HI Hindi hi-IN Language.HU Hungarian hu-HU Language.IS Icelandic is-IS Language.ID Indonesian id-ID Language.IT Italian it-IT Language.IT_CH Italian (Switzerland) it-CH Language.JA Japanese ja-JP Language.JV Javanese jv-ID Language.KN Kannada kn-IN Language.KK Kazakh kk-KZ Language.KM Khmer km-KH Language.KO Korean ko-KR Language.LO Lao lo-LA Language.LV Latvian lv-LV Language.LT Lithuanian lt-LT Language.MK Macedonian mk-MK Language.MS Malay ms-MY Language.ML Malayalam ml-IN Language.MR Marathi mr-IN Language.MN Mongolian mn-MN Language.NE Nepali ne-NP Language.NO Norwegian no-NO Language.FA Persian fa-IR Language.PL Polish pl-PL Language.PT Portuguese (Default: Portugal) pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.PT_PT Portuguese (Portugal) pt-PT Language.PA Punjabi pa-Guru-IN Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.SR Serbian sr-RS Language.SI Sinhala si-LK Language.SK Slovak sk-SK Language.SL Slovenian sl-SI Language.ES Spanish (Default: Spain) es-ES Language.ES_AR Spanish (Argentina) es-AR Language.ES_BO Spanish (Bolivia) es-BO Language.ES_CL Spanish (Chile) es-CL Language.ES_CO Spanish (Colombia) es-CO Language.ES_CR Spanish (Costa Rica) es-CR Language.ES_DO Spanish (Dominican Republic) es-DO Language.ES_EC Spanish (Ecuador) es-EC Language.ES_GT Spanish (Guatemala) es-GT Language.ES_HN Spanish (Honduras) es-HN Language.ES_MX Spanish (Mexico) es-MX Language.ES_NI Spanish (Nicaragua) es-NI Language.ES_PA Spanish (Panama) es-PA Language.ES_PE Spanish (Peru) es-PE Language.ES_PR Spanish (Puerto Rico) es-PR Language.ES_PY Spanish (Paraguay) es-PY Language.ES_SV Spanish (El Salvador) es-SV Language.ES_US Spanish (US) es-US Language.ES_UY Spanish (Uruguay) es-UY Language.ES_VE Spanish (Venezuela) es-VE Language.SU Sundanese su-ID Language.SW Swahili (Default: Tanzania) sw-TZ Language.SW_KE Swahili (Kenya) sw-KE Language.SW_TZ Swahili (Tanzania) sw-TZ Language.SV Swedish sv-SE Language.TA Tamil (Default: India) ta-IN Language.TA_IN Tamil (India) ta-IN Language.TA_MY Tamil (Malaysia) ta-MY Language.TA_SG Tamil (Singapore) ta-SG Language.TA_LK Tamil (Sri Lanka) ta-LK Language.TE Telugu te-IN Language.TH Thai th-TH Language.TR Turkish tr-TR Language.UK Ukrainian uk-UA Language.UR Urdu (Default: India) ur-IN Language.UR_IN Urdu (India) ur-IN Language.UR_PK Urdu (Pakistan) ur-PK Language.UZ Uzbek uz-UZ Language.VI Vietnamese vi-VN Language.XH Xhosa xh-ZA Language.ZU Zulu zu-ZA ​ Special Features Supports multiple languages simultaneously Provides regional variants for many languages Handles different Chinese scripts (simplified/traditional) Supports medical-specific models ​ Frame Flow ​ Notes Requires Google Cloud credentials Supports real-time transcription Handles streaming connection management Provides dynamic configuration updates Supports model switching Includes VAD capabilities Manages connection lifecycle Gladia Groq (Whisper) On this page Overview Installation Configuration Constructor Parameters InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Usage Example Regional Support Available Regions Configuration Dynamic Region Updates Notes Models Language Support Special Features Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_google_59c084bc.txt b/stt_google_59c084bc.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5ae57666f18373beeb204c113c8a6a3ea32c6342
--- /dev/null
+++ b/stt_google_59c084bc.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/google#param-credentials
+Title: Google - Pipecat
+==================================================
+
+Google - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Google Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GoogleSTTService provides real-time speech-to-text capabilities using Google Cloud’s Speech-to-Text V2 API. It supports interim results, multiple languages, and voice activity detection (VAD). ​ Installation To use GoogleSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need Google Cloud credentials either as a JSON string or file. You can obtain Google Cloud credentials by creating a service account in the Google Cloud Console . ​ Configuration ​ Constructor Parameters ​ credentials str Google Cloud service account credentials as JSON string ​ credentials_path str Path to service account credentials JSON file ​ location str default: "global" Google Cloud location for the service ​ sample_rate int Audio sample rate in Hertz ​ params InputParams Configuration parameters for the service ​ InputParams The InputParams class provides configuration options for the Google STT service. ​ languages Language | List[Language] default: "Language.EN_US" Single language or list of recognition languages. First language is primary. Examples: Language.EN_US [Language.EN_US, Language.ES_US] The first language in the list is considered primary. Recognition accuracy may vary with multiple languages. When using multiple languages, list them in order of expected usage frequency for optimal recognition results. ​ model str default: "latest_long" Speech recognition model to use. ​ use_separate_recognition_per_channel bool default: "False" Process each audio channel separately for multi-channel audio. ​ enable_automatic_punctuation bool default: "True" Automatically add punctuation marks to transcriptions. ​ enable_spoken_punctuation bool default: "False" Include spoken punctuation (e.g., “period”, “comma”) in transcript. ​ enable_spoken_emojis bool default: "False" Include spoken emojis (e.g., “smiley face”) in transcript. ​ profanity_filter bool default: "False" Filter profanity from transcriptions. ​ enable_word_time_offsets bool default: "False" Include timing information for each word. ​ enable_word_confidence bool default: "False" Include confidence scores for each word. ​ enable_interim_results bool default: "True" Stream partial recognition results as they become available. ​ enable_voice_activity_events bool default: "False" Enable voice activity detection events. Not all features are available for all models or languages Some combinations of options may affect latency or accuracy Model selection should match your use case for best results ​ Input The service processes raw audio data with: Linear16 PCM encoding 16-bit depth Configurable sample rate Single channel (mono) ​ Output Frames The service produces two types of frames: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Recognition language ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods ​ set_languages method Updates the service’s recognition language. Copy Ask AI async def set_languages ( language : List[Language]) -> None Example: Copy Ask AI await service.set_languages([Language. FR_FR ]) ​ set_model method Updates the service’s recognition model. Copy Ask AI async def set_model ( model : str ) -> None Example: Copy Ask AI await service.set_model( "medical_dictation" ) ​ update_options method Updates multiple service options dynamically. Copy Ask AI async def update_options ( * , languages : Optional[List[Language]] = None , model : Optional[ str ] = None , enable_automatic_punctuation : Optional[ bool ] = None , enable_spoken_punctuation : Optional[ bool ] = None , enable_spoken_emojis : Optional[ bool ] = None , profanity_filter : Optional[ bool ] = None , enable_word_time_offsets : Optional[ bool ] = None , enable_word_confidence : Optional[ bool ] = None , enable_interim_results : Optional[ bool ] = None , enable_voice_activity_events : Optional[ bool ] = None , location : Optional[ str ] = None , ) -> None Example: Copy Ask AI await service.update_options( languages = [Language. ES_ES , Language. EN_US ], enable_interim_results = True , profanity_filter = True ) See the STT base class methods for additional functionality. ​ Usage Example Copy Ask AI from pipecat.services.google.stt import GoogleSTTService from pipecat.transcriptions.language import Language # Configure service stt = GoogleSTTService( credentials_path = "path/to/credentials.json" , params = GoogleSTTService.InputParams( languages = Language. EN_US , model = "latest_long" , enable_automatic_punctuation = True , enable_interim_results = True ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, ... ]) ​ Regional Support Google Cloud Speech-to-Text V2 supports different regional endpoints for improved latency and data residency requirements. ​ Available Regions See supported languages, models, and features for each region in Google’s Speech-to-Text documentation . ​ Configuration Specify the region during service initialization: Copy Ask AI stt = GoogleSTTService( credentials = credentials, location = "us-central1" , # Use us-central1 endpoint params = GoogleSTTService.InputParams( model = "chirp_2" ) ) ​ Dynamic Region Updates The region can be updated during runtime: Copy Ask AI await stt.update_options( location = "asia" ) ​ Notes The global endpoint is used by default Regional endpoints may provide lower latency for users in those regions Some features or models might only be available in specific regions Regional selection may affect pricing Data residency requirements may dictate region selection ​ Models Model Name Description Best For chirp_2 Google’s latest ASR model General use cases latest_long Latest model optimized for long-form speech Conversations, meetings latest_short Latest model optimized for short-form speech Short messages, notes telephony Optimized for phone calls Call centers medical_dictation Optimized for medical terminology Healthcare dictation medical_conversation Optimized for doctor-patient interactions Medical consultations See Google Cloud’s Speech-to-Text documentation for more details. ​ Language Support Language Code Description Service Codes Language.AF Afrikaans af-ZA Language.SQ Albanian sq-AL Language.AM Amharic am-ET Language.AR Arabic (Default: Egypt) ar-EG Language.AR_AE Arabic (UAE) ar-AE Language.AR_BH Arabic (Bahrain) ar-BH Language.AR_DZ Arabic (Algeria) ar-DZ Language.AR_EG Arabic (Egypt) ar-EG Language.AR_IQ Arabic (Iraq) ar-IQ Language.AR_JO Arabic (Jordan) ar-JO Language.AR_KW Arabic (Kuwait) ar-KW Language.AR_LB Arabic (Lebanon) ar-LB Language.AR_MA Arabic (Morocco) ar-MA Language.AR_OM Arabic (Oman) ar-OM Language.AR_QA Arabic (Qatar) ar-QA Language.AR_SA Arabic (Saudi Arabia) ar-SA Language.AR_SY Arabic (Syria) ar-SY Language.AR_TN Arabic (Tunisia) ar-TN Language.AR_YE Arabic (Yemen) ar-YE Language.HY Armenian hy-AM Language.AZ Azerbaijani az-AZ Language.EU Basque eu-ES Language.BN Bengali (Default: India) bn-IN Language.BN_BD Bengali (Bangladesh) bn-BD Language.BN_IN Bengali (India) bn-IN Language.BS Bosnian bs-BA Language.BG Bulgarian bg-BG Language.MY Burmese my-MM Language.CA Catalan ca-ES Language.ZH Chinese (Default: Simplified) cmn-Hans-CN Language.ZH_CN Chinese (Simplified) cmn-Hans-CN Language.ZH_HK Chinese (Hong Kong) cmn-Hans-HK Language.ZH_TW Chinese (Traditional) cmn-Hant-TW Language.YUE Chinese (Cantonese) yue-Hant-HK Language.HR Croatian hr-HR Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch (Default: Netherlands) nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.NL_NL Dutch (Netherlands) nl-NL Language.EN English (Default: US) en-US Language.EN_AU English (Australia) en-AU Language.EN_CA English (Canada) en-CA Language.EN_GB English (UK) en-GB Language.EN_GH English (Ghana) en-GH Language.EN_HK English (Hong Kong) en-HK Language.EN_IN English (India) en-IN Language.EN_IE English (Ireland) en-IE Language.EN_KE English (Kenya) en-KE Language.EN_NG English (Nigeria) en-NG Language.EN_NZ English (New Zealand) en-NZ Language.EN_PH English (Philippines) en-PH Language.EN_SG English (Singapore) en-SG Language.EN_TZ English (Tanzania) en-TZ Language.EN_US English (US) en-US Language.EN_ZA English (South Africa) en-ZA Language.ET Estonian et-EE Language.FIL Filipino fil-PH Language.FI Finnish fi-FI Language.FR French (Default: France) fr-FR Language.FR_BE French (Belgium) fr-BE Language.FR_CA French (Canada) fr-CA Language.FR_CH French (Switzerland) fr-CH Language.GL Galician gl-ES Language.KA Georgian ka-GE Language.DE German (Default: Germany) de-DE Language.DE_AT German (Austria) de-AT Language.DE_CH German (Switzerland) de-CH Language.EL Greek el-GR Language.GU Gujarati gu-IN Language.HE Hebrew iw-IL Language.HI Hindi hi-IN Language.HU Hungarian hu-HU Language.IS Icelandic is-IS Language.ID Indonesian id-ID Language.IT Italian it-IT Language.IT_CH Italian (Switzerland) it-CH Language.JA Japanese ja-JP Language.JV Javanese jv-ID Language.KN Kannada kn-IN Language.KK Kazakh kk-KZ Language.KM Khmer km-KH Language.KO Korean ko-KR Language.LO Lao lo-LA Language.LV Latvian lv-LV Language.LT Lithuanian lt-LT Language.MK Macedonian mk-MK Language.MS Malay ms-MY Language.ML Malayalam ml-IN Language.MR Marathi mr-IN Language.MN Mongolian mn-MN Language.NE Nepali ne-NP Language.NO Norwegian no-NO Language.FA Persian fa-IR Language.PL Polish pl-PL Language.PT Portuguese (Default: Portugal) pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.PT_PT Portuguese (Portugal) pt-PT Language.PA Punjabi pa-Guru-IN Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.SR Serbian sr-RS Language.SI Sinhala si-LK Language.SK Slovak sk-SK Language.SL Slovenian sl-SI Language.ES Spanish (Default: Spain) es-ES Language.ES_AR Spanish (Argentina) es-AR Language.ES_BO Spanish (Bolivia) es-BO Language.ES_CL Spanish (Chile) es-CL Language.ES_CO Spanish (Colombia) es-CO Language.ES_CR Spanish (Costa Rica) es-CR Language.ES_DO Spanish (Dominican Republic) es-DO Language.ES_EC Spanish (Ecuador) es-EC Language.ES_GT Spanish (Guatemala) es-GT Language.ES_HN Spanish (Honduras) es-HN Language.ES_MX Spanish (Mexico) es-MX Language.ES_NI Spanish (Nicaragua) es-NI Language.ES_PA Spanish (Panama) es-PA Language.ES_PE Spanish (Peru) es-PE Language.ES_PR Spanish (Puerto Rico) es-PR Language.ES_PY Spanish (Paraguay) es-PY Language.ES_SV Spanish (El Salvador) es-SV Language.ES_US Spanish (US) es-US Language.ES_UY Spanish (Uruguay) es-UY Language.ES_VE Spanish (Venezuela) es-VE Language.SU Sundanese su-ID Language.SW Swahili (Default: Tanzania) sw-TZ Language.SW_KE Swahili (Kenya) sw-KE Language.SW_TZ Swahili (Tanzania) sw-TZ Language.SV Swedish sv-SE Language.TA Tamil (Default: India) ta-IN Language.TA_IN Tamil (India) ta-IN Language.TA_MY Tamil (Malaysia) ta-MY Language.TA_SG Tamil (Singapore) ta-SG Language.TA_LK Tamil (Sri Lanka) ta-LK Language.TE Telugu te-IN Language.TH Thai th-TH Language.TR Turkish tr-TR Language.UK Ukrainian uk-UA Language.UR Urdu (Default: India) ur-IN Language.UR_IN Urdu (India) ur-IN Language.UR_PK Urdu (Pakistan) ur-PK Language.UZ Uzbek uz-UZ Language.VI Vietnamese vi-VN Language.XH Xhosa xh-ZA Language.ZU Zulu zu-ZA ​ Special Features Supports multiple languages simultaneously Provides regional variants for many languages Handles different Chinese scripts (simplified/traditional) Supports medical-specific models ​ Frame Flow ​ Notes Requires Google Cloud credentials Supports real-time transcription Handles streaming connection management Provides dynamic configuration updates Supports model switching Includes VAD capabilities Manages connection lifecycle Gladia Groq (Whisper) On this page Overview Installation Configuration Constructor Parameters InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Usage Example Regional Support Available Regions Configuration Dynamic Region Updates Notes Models Language Support Special Features Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_google_7d3bf93e.txt b/stt_google_7d3bf93e.txt
new file mode 100644
index 0000000000000000000000000000000000000000..12c6cbed9ac33d9e23a17eb35d39eaacd4f73224
--- /dev/null
+++ b/stt_google_7d3bf93e.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/google#language-support
+Title: Google - Pipecat
+==================================================
+
+Google - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Google Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GoogleSTTService provides real-time speech-to-text capabilities using Google Cloud’s Speech-to-Text V2 API. It supports interim results, multiple languages, and voice activity detection (VAD). ​ Installation To use GoogleSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll need Google Cloud credentials either as a JSON string or file. You can obtain Google Cloud credentials by creating a service account in the Google Cloud Console . ​ Configuration ​ Constructor Parameters ​ credentials str Google Cloud service account credentials as JSON string ​ credentials_path str Path to service account credentials JSON file ​ location str default: "global" Google Cloud location for the service ​ sample_rate int Audio sample rate in Hertz ​ params InputParams Configuration parameters for the service ​ InputParams The InputParams class provides configuration options for the Google STT service. ​ languages Language | List[Language] default: "Language.EN_US" Single language or list of recognition languages. First language is primary. Examples: Language.EN_US [Language.EN_US, Language.ES_US] The first language in the list is considered primary. Recognition accuracy may vary with multiple languages. When using multiple languages, list them in order of expected usage frequency for optimal recognition results. ​ model str default: "latest_long" Speech recognition model to use. ​ use_separate_recognition_per_channel bool default: "False" Process each audio channel separately for multi-channel audio. ​ enable_automatic_punctuation bool default: "True" Automatically add punctuation marks to transcriptions. ​ enable_spoken_punctuation bool default: "False" Include spoken punctuation (e.g., “period”, “comma”) in transcript. ​ enable_spoken_emojis bool default: "False" Include spoken emojis (e.g., “smiley face”) in transcript. ​ profanity_filter bool default: "False" Filter profanity from transcriptions. ​ enable_word_time_offsets bool default: "False" Include timing information for each word. ​ enable_word_confidence bool default: "False" Include confidence scores for each word. ​ enable_interim_results bool default: "True" Stream partial recognition results as they become available. ​ enable_voice_activity_events bool default: "False" Enable voice activity detection events. Not all features are available for all models or languages Some combinations of options may affect latency or accuracy Model selection should match your use case for best results ​ Input The service processes raw audio data with: Linear16 PCM encoding 16-bit depth Configurable sample rate Single channel (mono) ​ Output Frames The service produces two types of frames: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Recognition language ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods ​ set_languages method Updates the service’s recognition language. Copy Ask AI async def set_languages ( language : List[Language]) -> None Example: Copy Ask AI await service.set_languages([Language. FR_FR ]) ​ set_model method Updates the service’s recognition model. Copy Ask AI async def set_model ( model : str ) -> None Example: Copy Ask AI await service.set_model( "medical_dictation" ) ​ update_options method Updates multiple service options dynamically. Copy Ask AI async def update_options ( * , languages : Optional[List[Language]] = None , model : Optional[ str ] = None , enable_automatic_punctuation : Optional[ bool ] = None , enable_spoken_punctuation : Optional[ bool ] = None , enable_spoken_emojis : Optional[ bool ] = None , profanity_filter : Optional[ bool ] = None , enable_word_time_offsets : Optional[ bool ] = None , enable_word_confidence : Optional[ bool ] = None , enable_interim_results : Optional[ bool ] = None , enable_voice_activity_events : Optional[ bool ] = None , location : Optional[ str ] = None , ) -> None Example: Copy Ask AI await service.update_options( languages = [Language. ES_ES , Language. EN_US ], enable_interim_results = True , profanity_filter = True ) See the STT base class methods for additional functionality. ​ Usage Example Copy Ask AI from pipecat.services.google.stt import GoogleSTTService from pipecat.transcriptions.language import Language # Configure service stt = GoogleSTTService( credentials_path = "path/to/credentials.json" , params = GoogleSTTService.InputParams( languages = Language. EN_US , model = "latest_long" , enable_automatic_punctuation = True , enable_interim_results = True ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, ... ]) ​ Regional Support Google Cloud Speech-to-Text V2 supports different regional endpoints for improved latency and data residency requirements. ​ Available Regions See supported languages, models, and features for each region in Google’s Speech-to-Text documentation . ​ Configuration Specify the region during service initialization: Copy Ask AI stt = GoogleSTTService( credentials = credentials, location = "us-central1" , # Use us-central1 endpoint params = GoogleSTTService.InputParams( model = "chirp_2" ) ) ​ Dynamic Region Updates The region can be updated during runtime: Copy Ask AI await stt.update_options( location = "asia" ) ​ Notes The global endpoint is used by default Regional endpoints may provide lower latency for users in those regions Some features or models might only be available in specific regions Regional selection may affect pricing Data residency requirements may dictate region selection ​ Models Model Name Description Best For chirp_2 Google’s latest ASR model General use cases latest_long Latest model optimized for long-form speech Conversations, meetings latest_short Latest model optimized for short-form speech Short messages, notes telephony Optimized for phone calls Call centers medical_dictation Optimized for medical terminology Healthcare dictation medical_conversation Optimized for doctor-patient interactions Medical consultations See Google Cloud’s Speech-to-Text documentation for more details. ​ Language Support Language Code Description Service Codes Language.AF Afrikaans af-ZA Language.SQ Albanian sq-AL Language.AM Amharic am-ET Language.AR Arabic (Default: Egypt) ar-EG Language.AR_AE Arabic (UAE) ar-AE Language.AR_BH Arabic (Bahrain) ar-BH Language.AR_DZ Arabic (Algeria) ar-DZ Language.AR_EG Arabic (Egypt) ar-EG Language.AR_IQ Arabic (Iraq) ar-IQ Language.AR_JO Arabic (Jordan) ar-JO Language.AR_KW Arabic (Kuwait) ar-KW Language.AR_LB Arabic (Lebanon) ar-LB Language.AR_MA Arabic (Morocco) ar-MA Language.AR_OM Arabic (Oman) ar-OM Language.AR_QA Arabic (Qatar) ar-QA Language.AR_SA Arabic (Saudi Arabia) ar-SA Language.AR_SY Arabic (Syria) ar-SY Language.AR_TN Arabic (Tunisia) ar-TN Language.AR_YE Arabic (Yemen) ar-YE Language.HY Armenian hy-AM Language.AZ Azerbaijani az-AZ Language.EU Basque eu-ES Language.BN Bengali (Default: India) bn-IN Language.BN_BD Bengali (Bangladesh) bn-BD Language.BN_IN Bengali (India) bn-IN Language.BS Bosnian bs-BA Language.BG Bulgarian bg-BG Language.MY Burmese my-MM Language.CA Catalan ca-ES Language.ZH Chinese (Default: Simplified) cmn-Hans-CN Language.ZH_CN Chinese (Simplified) cmn-Hans-CN Language.ZH_HK Chinese (Hong Kong) cmn-Hans-HK Language.ZH_TW Chinese (Traditional) cmn-Hant-TW Language.YUE Chinese (Cantonese) yue-Hant-HK Language.HR Croatian hr-HR Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch (Default: Netherlands) nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.NL_NL Dutch (Netherlands) nl-NL Language.EN English (Default: US) en-US Language.EN_AU English (Australia) en-AU Language.EN_CA English (Canada) en-CA Language.EN_GB English (UK) en-GB Language.EN_GH English (Ghana) en-GH Language.EN_HK English (Hong Kong) en-HK Language.EN_IN English (India) en-IN Language.EN_IE English (Ireland) en-IE Language.EN_KE English (Kenya) en-KE Language.EN_NG English (Nigeria) en-NG Language.EN_NZ English (New Zealand) en-NZ Language.EN_PH English (Philippines) en-PH Language.EN_SG English (Singapore) en-SG Language.EN_TZ English (Tanzania) en-TZ Language.EN_US English (US) en-US Language.EN_ZA English (South Africa) en-ZA Language.ET Estonian et-EE Language.FIL Filipino fil-PH Language.FI Finnish fi-FI Language.FR French (Default: France) fr-FR Language.FR_BE French (Belgium) fr-BE Language.FR_CA French (Canada) fr-CA Language.FR_CH French (Switzerland) fr-CH Language.GL Galician gl-ES Language.KA Georgian ka-GE Language.DE German (Default: Germany) de-DE Language.DE_AT German (Austria) de-AT Language.DE_CH German (Switzerland) de-CH Language.EL Greek el-GR Language.GU Gujarati gu-IN Language.HE Hebrew iw-IL Language.HI Hindi hi-IN Language.HU Hungarian hu-HU Language.IS Icelandic is-IS Language.ID Indonesian id-ID Language.IT Italian it-IT Language.IT_CH Italian (Switzerland) it-CH Language.JA Japanese ja-JP Language.JV Javanese jv-ID Language.KN Kannada kn-IN Language.KK Kazakh kk-KZ Language.KM Khmer km-KH Language.KO Korean ko-KR Language.LO Lao lo-LA Language.LV Latvian lv-LV Language.LT Lithuanian lt-LT Language.MK Macedonian mk-MK Language.MS Malay ms-MY Language.ML Malayalam ml-IN Language.MR Marathi mr-IN Language.MN Mongolian mn-MN Language.NE Nepali ne-NP Language.NO Norwegian no-NO Language.FA Persian fa-IR Language.PL Polish pl-PL Language.PT Portuguese (Default: Portugal) pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.PT_PT Portuguese (Portugal) pt-PT Language.PA Punjabi pa-Guru-IN Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.SR Serbian sr-RS Language.SI Sinhala si-LK Language.SK Slovak sk-SK Language.SL Slovenian sl-SI Language.ES Spanish (Default: Spain) es-ES Language.ES_AR Spanish (Argentina) es-AR Language.ES_BO Spanish (Bolivia) es-BO Language.ES_CL Spanish (Chile) es-CL Language.ES_CO Spanish (Colombia) es-CO Language.ES_CR Spanish (Costa Rica) es-CR Language.ES_DO Spanish (Dominican Republic) es-DO Language.ES_EC Spanish (Ecuador) es-EC Language.ES_GT Spanish (Guatemala) es-GT Language.ES_HN Spanish (Honduras) es-HN Language.ES_MX Spanish (Mexico) es-MX Language.ES_NI Spanish (Nicaragua) es-NI Language.ES_PA Spanish (Panama) es-PA Language.ES_PE Spanish (Peru) es-PE Language.ES_PR Spanish (Puerto Rico) es-PR Language.ES_PY Spanish (Paraguay) es-PY Language.ES_SV Spanish (El Salvador) es-SV Language.ES_US Spanish (US) es-US Language.ES_UY Spanish (Uruguay) es-UY Language.ES_VE Spanish (Venezuela) es-VE Language.SU Sundanese su-ID Language.SW Swahili (Default: Tanzania) sw-TZ Language.SW_KE Swahili (Kenya) sw-KE Language.SW_TZ Swahili (Tanzania) sw-TZ Language.SV Swedish sv-SE Language.TA Tamil (Default: India) ta-IN Language.TA_IN Tamil (India) ta-IN Language.TA_MY Tamil (Malaysia) ta-MY Language.TA_SG Tamil (Singapore) ta-SG Language.TA_LK Tamil (Sri Lanka) ta-LK Language.TE Telugu te-IN Language.TH Thai th-TH Language.TR Turkish tr-TR Language.UK Ukrainian uk-UA Language.UR Urdu (Default: India) ur-IN Language.UR_IN Urdu (India) ur-IN Language.UR_PK Urdu (Pakistan) ur-PK Language.UZ Uzbek uz-UZ Language.VI Vietnamese vi-VN Language.XH Xhosa xh-ZA Language.ZU Zulu zu-ZA ​ Special Features Supports multiple languages simultaneously Provides regional variants for many languages Handles different Chinese scripts (simplified/traditional) Supports medical-specific models ​ Frame Flow ​ Notes Requires Google Cloud credentials Supports real-time transcription Handles streaming connection management Provides dynamic configuration updates Supports model switching Includes VAD capabilities Manages connection lifecycle Gladia Groq (Whisper) On this page Overview Installation Configuration Constructor Parameters InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Usage Example Regional Support Available Regions Configuration Dynamic Region Updates Notes Models Language Support Special Features Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_groq_2d2a3970.txt b/stt_groq_2d2a3970.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5fb1cd50d3d5796bf0ea59d315938745d164baa9
--- /dev/null
+++ b/stt_groq_2d2a3970.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/groq#param-user-id
+Title: Groq (Whisper) - Pipecat
+==================================================
+
+Groq (Whisper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Groq (Whisper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GroqSTTService provides speech-to-text capabilities using Groq’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. ​ Installation To use GroqSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[groq]" You’ll need to set up your Groq API key as an environment variable: GROQ_API_KEY . You can obtain a Groq API key from the Groq Console . ​ Configuration ​ Constructor Parameters ​ model str default: "whisper-large-v3-turbo" Whisper model to use. Currently only “whisper-large-v3-turbo” is available. ​ api_key str Your Groq API key. If not provided, will use environment variable. ​ base_url str default: "https://api.groq.com/openai/v1" Custom API base URL for Groq API requests. ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ prompt str Optional text to guide the model’s style or continue a previous segment. ​ temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "whisper-large-v3-turbo" ) See the STT base class methods for additional functionality. ​ Language Support Groq’s Whisper API supports a wide range of languages. The service automatically maps Language enum values to the appropriate Whisper language codes. Language Code Description Whisper Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy Groq’s Whisper implementation supports language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. For the most up-to-date list of supported languages, refer to the Groq documentation. ​ Usage Example Copy Ask AI from pipecat.services.groq.stt import GroqSTTService from pipecat.transcriptions.language import Language # Configure service stt = GroqSTTService( model = "whisper-large-v3-turbo" , api_key = "your-api-key" , language = Language. EN , prompt = "Transcribe the following conversation" , temperature = 0.0 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Time to First Byte (TTFB) Processing duration API response time ​ Notes Requires valid Groq API key Uses Groq’s hosted Whisper model Requires VAD component in transport Processes complete utterances, not continuous audio Handles API rate limiting Automatic error handling Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors Rate limiting Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. Google NVIDIA Riva On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_groq_3a82d491.txt b/stt_groq_3a82d491.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d678de72ebcd68711fec10ee14274066a112133f
--- /dev/null
+++ b/stt_groq_3a82d491.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/groq#param-sample-rate
+Title: Groq (Whisper) - Pipecat
+==================================================
+
+Groq (Whisper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Groq (Whisper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GroqSTTService provides speech-to-text capabilities using Groq’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. ​ Installation To use GroqSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[groq]" You’ll need to set up your Groq API key as an environment variable: GROQ_API_KEY . You can obtain a Groq API key from the Groq Console . ​ Configuration ​ Constructor Parameters ​ model str default: "whisper-large-v3-turbo" Whisper model to use. Currently only “whisper-large-v3-turbo” is available. ​ api_key str Your Groq API key. If not provided, will use environment variable. ​ base_url str default: "https://api.groq.com/openai/v1" Custom API base URL for Groq API requests. ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ prompt str Optional text to guide the model’s style or continue a previous segment. ​ temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "whisper-large-v3-turbo" ) See the STT base class methods for additional functionality. ​ Language Support Groq’s Whisper API supports a wide range of languages. The service automatically maps Language enum values to the appropriate Whisper language codes. Language Code Description Whisper Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy Groq’s Whisper implementation supports language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. For the most up-to-date list of supported languages, refer to the Groq documentation. ​ Usage Example Copy Ask AI from pipecat.services.groq.stt import GroqSTTService from pipecat.transcriptions.language import Language # Configure service stt = GroqSTTService( model = "whisper-large-v3-turbo" , api_key = "your-api-key" , language = Language. EN , prompt = "Transcribe the following conversation" , temperature = 0.0 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Time to First Byte (TTFB) Processing duration API response time ​ Notes Requires valid Groq API key Uses Groq’s hosted Whisper model Requires VAD component in transport Processes complete utterances, not continuous audio Handles API rate limiting Automatic error handling Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors Rate limiting Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. Google NVIDIA Riva On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_groq_797315f1.txt b/stt_groq_797315f1.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e1612644efdd8171a98630358891693d0ef5f1e7
--- /dev/null
+++ b/stt_groq_797315f1.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/groq#notes
+Title: Groq (Whisper) - Pipecat
+==================================================
+
+Groq (Whisper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Groq (Whisper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GroqSTTService provides speech-to-text capabilities using Groq’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. ​ Installation To use GroqSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[groq]" You’ll need to set up your Groq API key as an environment variable: GROQ_API_KEY . You can obtain a Groq API key from the Groq Console . ​ Configuration ​ Constructor Parameters ​ model str default: "whisper-large-v3-turbo" Whisper model to use. Currently only “whisper-large-v3-turbo” is available. ​ api_key str Your Groq API key. If not provided, will use environment variable. ​ base_url str default: "https://api.groq.com/openai/v1" Custom API base URL for Groq API requests. ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ prompt str Optional text to guide the model’s style or continue a previous segment. ​ temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "whisper-large-v3-turbo" ) See the STT base class methods for additional functionality. ​ Language Support Groq’s Whisper API supports a wide range of languages. The service automatically maps Language enum values to the appropriate Whisper language codes. Language Code Description Whisper Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy Groq’s Whisper implementation supports language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. For the most up-to-date list of supported languages, refer to the Groq documentation. ​ Usage Example Copy Ask AI from pipecat.services.groq.stt import GroqSTTService from pipecat.transcriptions.language import Language # Configure service stt = GroqSTTService( model = "whisper-large-v3-turbo" , api_key = "your-api-key" , language = Language. EN , prompt = "Transcribe the following conversation" , temperature = 0.0 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Time to First Byte (TTFB) Processing duration API response time ​ Notes Requires valid Groq API key Uses Groq’s hosted Whisper model Requires VAD component in transport Processes complete utterances, not continuous audio Handles API rate limiting Automatic error handling Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors Rate limiting Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. Google NVIDIA Riva On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_openai_8a1f9a1f.txt b/stt_openai_8a1f9a1f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..439595feb7b831634658a05478de3d5c180fb099
--- /dev/null
+++ b/stt_openai_8a1f9a1f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/openai#output-frames
+Title: OpenAI - Pipecat
+==================================================
+
+OpenAI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text OpenAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenAISTTService provides speech-to-text capabilities using OpenAI’s latest models, including the GPT-4o transcription model and the hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements, using Voice Activity Detection (VAD) to process only speech segments. ​ Installation To use OpenAISTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . You can obtain an OpenAI API key from the OpenAI platform . ​ Configuration ​ Constructor Parameters ​ model str default: "gpt-4o-transcribe" Model to use. Supported models include “gpt-4o-transcribe” (recommended) and “whisper-1”. ​ api_key str Your OpenAI API key. ​ base_url str Custom API base URL for OpenAI API requests. ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ prompt str Optional text to guide the model’s style or continue a previous segment. ​ temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "gpt-4o-transcribe" ) # For the latest GPT-4o transcription model # or await service.set_model( "whisper-1" ) # For the Whisper model See the STT base class methods for additional functionality. ​ Models Model Description Best For gpt-4o-transcribe Latest GPT-4o model fine-tuned for transcription High accuracy, robustness to accents, better context understanding whisper-1 OpenAI’s Whisper model Broad language support, good performance on clean audio ​ Language Support OpenAI’s speech-to-text models support a wide range of languages. The service automatically maps Language enum values to the appropriate language codes. Language Code Description Service Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy OpenAI’s models support language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. ​ Usage Example Copy Ask AI from pipecat.services.openai.stt import OpenAISTTService # Configure service stt = OpenAISTTService( model = "gpt-4o-transcribe" , api_key = "your-api-key" , language = Language. EN , prompt = "Transcribe technical terms accurately. Format numbers as digits rather than words." ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Time to First Byte (TTFB) Processing duration API response time ​ Notes Requires valid OpenAI API key GPT-4o transcription model offers superior accuracy to Whisper Requires VAD component in transport Handles API rate limiting Automatic error handling Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors Rate limiting Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. NVIDIA Riva SambaNova (Whisper) On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Models Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_openai_9f8e4357.txt b/stt_openai_9f8e4357.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f279029c9afadd520e87f77c0d9f8083b6e74682
--- /dev/null
+++ b/stt_openai_9f8e4357.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/openai#overview
+Title: OpenAI - Pipecat
+==================================================
+
+OpenAI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text OpenAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenAISTTService provides speech-to-text capabilities using OpenAI’s latest models, including the GPT-4o transcription model and the hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements, using Voice Activity Detection (VAD) to process only speech segments. ​ Installation To use OpenAISTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . You can obtain an OpenAI API key from the OpenAI platform . ​ Configuration ​ Constructor Parameters ​ model str default: "gpt-4o-transcribe" Model to use. Supported models include “gpt-4o-transcribe” (recommended) and “whisper-1”. ​ api_key str Your OpenAI API key. ​ base_url str Custom API base URL for OpenAI API requests. ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ prompt str Optional text to guide the model’s style or continue a previous segment. ​ temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "gpt-4o-transcribe" ) # For the latest GPT-4o transcription model # or await service.set_model( "whisper-1" ) # For the Whisper model See the STT base class methods for additional functionality. ​ Models Model Description Best For gpt-4o-transcribe Latest GPT-4o model fine-tuned for transcription High accuracy, robustness to accents, better context understanding whisper-1 OpenAI’s Whisper model Broad language support, good performance on clean audio ​ Language Support OpenAI’s speech-to-text models support a wide range of languages. The service automatically maps Language enum values to the appropriate language codes. Language Code Description Service Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy OpenAI’s models support language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. ​ Usage Example Copy Ask AI from pipecat.services.openai.stt import OpenAISTTService # Configure service stt = OpenAISTTService( model = "gpt-4o-transcribe" , api_key = "your-api-key" , language = Language. EN , prompt = "Transcribe technical terms accurately. Format numbers as digits rather than words." ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Time to First Byte (TTFB) Processing duration API response time ​ Notes Requires valid OpenAI API key GPT-4o transcription model offers superior accuracy to Whisper Requires VAD component in transport Handles API rate limiting Automatic error handling Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors Rate limiting Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. NVIDIA Riva SambaNova (Whisper) On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Models Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_openai_c044dd82.txt b/stt_openai_c044dd82.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6703676cde4cf6c8de50636fb63f68a6cfe8834b
--- /dev/null
+++ b/stt_openai_c044dd82.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/openai#voice-activity-detection-integration
+Title: OpenAI - Pipecat
+==================================================
+
+OpenAI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text OpenAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenAISTTService provides speech-to-text capabilities using OpenAI’s latest models, including the GPT-4o transcription model and the hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements, using Voice Activity Detection (VAD) to process only speech segments. ​ Installation To use OpenAISTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY . You can obtain an OpenAI API key from the OpenAI platform . ​ Configuration ​ Constructor Parameters ​ model str default: "gpt-4o-transcribe" Model to use. Supported models include “gpt-4o-transcribe” (recommended) and “whisper-1”. ​ api_key str Your OpenAI API key. ​ base_url str Custom API base URL for OpenAI API requests. ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ prompt str Optional text to guide the model’s style or continue a previous segment. ​ temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ Input The service processes audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "gpt-4o-transcribe" ) # For the latest GPT-4o transcription model # or await service.set_model( "whisper-1" ) # For the Whisper model See the STT base class methods for additional functionality. ​ Models Model Description Best For gpt-4o-transcribe Latest GPT-4o model fine-tuned for transcription High accuracy, robustness to accents, better context understanding whisper-1 OpenAI’s Whisper model Broad language support, good performance on clean audio ​ Language Support OpenAI’s speech-to-text models support a wide range of languages. The service automatically maps Language enum values to the appropriate language codes. Language Code Description Service Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy OpenAI’s models support language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. ​ Usage Example Copy Ask AI from pipecat.services.openai.stt import OpenAISTTService # Configure service stt = OpenAISTTService( model = "gpt-4o-transcribe" , api_key = "your-api-key" , language = Language. EN , prompt = "Transcribe technical terms accurately. Format numbers as digits rather than words." ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline Only sends complete utterances to the API when speech has ended Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Time to First Byte (TTFB) Processing duration API response time ​ Notes Requires valid OpenAI API key GPT-4o transcription model offers superior accuracy to Whisper Requires VAD component in transport Handles API rate limiting Automatic error handling Thread-safe processing ​ Error Handling The service handles common API errors including: Authentication errors Rate limiting Invalid audio format Network connectivity issues API timeouts Errors are propagated through ErrorFrames with descriptive messages. NVIDIA Riva SambaNova (Whisper) On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Models Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_riva_1f766137.txt b/stt_riva_1f766137.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7935e72fc65b90e29d5cfbb16dfa46e106347f57
--- /dev/null
+++ b/stt_riva_1f766137.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/riva#methods
+Title: NVIDIA Riva - Pipecat
+==================================================
+
+NVIDIA Riva - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text NVIDIA Riva Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview RivaSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. RivaSegmentedSTTService provides speech-to-text capabilities via NVIDIA’s Riva Canary model. ​ Installation To use RivaSTTService or RivaSegmentedSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[riva]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal . ​ RivaSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw PCM audio data 16-bit depth Single channel (mono) ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ RivaSegmentedSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw audio bytes in WAV format ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Models Model Pipecat Class Model Card Link parakeet-ctc-1.1b-asr RivaSTTService NVIDIA Model Card canary-1b-asr RivaSegmentedSTTService NVIDIA Model Card ​ Usage Examples ​ RivaSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ RivaSegmentedSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSegmentedSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSegmentedSTTService( api_key = "your-nvidia-api-key" , params = RivaSegmentedSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support Riva model parakeet-ctc-1.1b-asr (default) primarily supports English with various regional accents: Language Code Description Service Codes Language.EN_US English (US) en-US ​ Frame Flow ​ Advanced Configuration The service supports several advanced configuration options that can be adjusted: ​ _profanity_filter bool default: "False" Filter profanity from transcription ​ _automatic_punctuation bool default: "False" Automatically add punctuation ​ _no_verbatim_transcripts bool default: "False" Whether to disable verbatim transcripts ​ _boosted_lm_words list default: "None" List of words to boost in the language model ​ _boosted_lm_score float default: "4.0" Score applied to boosted words ​ Example with Advanced Configuration Copy Ask AI # Configure service with advanced parameters stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Configure advanced options stt._profanity_filter = True stt._automatic_punctuation = True stt._boosted_lm_words = [ "PipeCat" , "AI" , "speech" ] ​ Notes Uses NVIDIA’s Riva AI Services platform Handles streaming audio input Provides real-time transcription results Manages connection lifecycle Uses asyncio for asynchronous processing Automatically cleans up resources on stop/cancel Groq (Whisper) OpenAI On this page Overview Installation RivaSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame RivaSegmentedSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Models Usage Examples RivaSTTService RivaSegmentedSTTService Language Support Frame Flow Advanced Configuration Example with Advanced Configuration Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_riva_33d82c31.txt b/stt_riva_33d82c31.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a203e4457ca910d375fa01b6ee2377530b6c93d0
--- /dev/null
+++ b/stt_riva_33d82c31.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/riva#param-model-function-map-1
+Title: NVIDIA Riva - Pipecat
+==================================================
+
+NVIDIA Riva - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text NVIDIA Riva Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview RivaSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. RivaSegmentedSTTService provides speech-to-text capabilities via NVIDIA’s Riva Canary model. ​ Installation To use RivaSTTService or RivaSegmentedSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[riva]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal . ​ RivaSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw PCM audio data 16-bit depth Single channel (mono) ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ RivaSegmentedSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw audio bytes in WAV format ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Models Model Pipecat Class Model Card Link parakeet-ctc-1.1b-asr RivaSTTService NVIDIA Model Card canary-1b-asr RivaSegmentedSTTService NVIDIA Model Card ​ Usage Examples ​ RivaSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ RivaSegmentedSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSegmentedSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSegmentedSTTService( api_key = "your-nvidia-api-key" , params = RivaSegmentedSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support Riva model parakeet-ctc-1.1b-asr (default) primarily supports English with various regional accents: Language Code Description Service Codes Language.EN_US English (US) en-US ​ Frame Flow ​ Advanced Configuration The service supports several advanced configuration options that can be adjusted: ​ _profanity_filter bool default: "False" Filter profanity from transcription ​ _automatic_punctuation bool default: "False" Automatically add punctuation ​ _no_verbatim_transcripts bool default: "False" Whether to disable verbatim transcripts ​ _boosted_lm_words list default: "None" List of words to boost in the language model ​ _boosted_lm_score float default: "4.0" Score applied to boosted words ​ Example with Advanced Configuration Copy Ask AI # Configure service with advanced parameters stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Configure advanced options stt._profanity_filter = True stt._automatic_punctuation = True stt._boosted_lm_words = [ "PipeCat" , "AI" , "speech" ] ​ Notes Uses NVIDIA’s Riva AI Services platform Handles streaming audio input Provides real-time transcription results Manages connection lifecycle Uses asyncio for asynchronous processing Automatically cleans up resources on stop/cancel Groq (Whisper) OpenAI On this page Overview Installation RivaSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame RivaSegmentedSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Models Usage Examples RivaSTTService RivaSegmentedSTTService Language Support Frame Flow Advanced Configuration Example with Advanced Configuration Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_riva_37fd109a.txt b/stt_riva_37fd109a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..3c0f719befad03fb65f879423b0bd00c3d1791db
--- /dev/null
+++ b/stt_riva_37fd109a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/riva#param-server-1
+Title: NVIDIA Riva - Pipecat
+==================================================
+
+NVIDIA Riva - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text NVIDIA Riva Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview RivaSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. RivaSegmentedSTTService provides speech-to-text capabilities via NVIDIA’s Riva Canary model. ​ Installation To use RivaSTTService or RivaSegmentedSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[riva]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal . ​ RivaSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw PCM audio data 16-bit depth Single channel (mono) ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ RivaSegmentedSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw audio bytes in WAV format ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Models Model Pipecat Class Model Card Link parakeet-ctc-1.1b-asr RivaSTTService NVIDIA Model Card canary-1b-asr RivaSegmentedSTTService NVIDIA Model Card ​ Usage Examples ​ RivaSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ RivaSegmentedSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSegmentedSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSegmentedSTTService( api_key = "your-nvidia-api-key" , params = RivaSegmentedSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support Riva model parakeet-ctc-1.1b-asr (default) primarily supports English with various regional accents: Language Code Description Service Codes Language.EN_US English (US) en-US ​ Frame Flow ​ Advanced Configuration The service supports several advanced configuration options that can be adjusted: ​ _profanity_filter bool default: "False" Filter profanity from transcription ​ _automatic_punctuation bool default: "False" Automatically add punctuation ​ _no_verbatim_transcripts bool default: "False" Whether to disable verbatim transcripts ​ _boosted_lm_words list default: "None" List of words to boost in the language model ​ _boosted_lm_score float default: "4.0" Score applied to boosted words ​ Example with Advanced Configuration Copy Ask AI # Configure service with advanced parameters stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Configure advanced options stt._profanity_filter = True stt._automatic_punctuation = True stt._boosted_lm_words = [ "PipeCat" , "AI" , "speech" ] ​ Notes Uses NVIDIA’s Riva AI Services platform Handles streaming audio input Provides real-time transcription results Manages connection lifecycle Uses asyncio for asynchronous processing Automatically cleans up resources on stop/cancel Groq (Whisper) OpenAI On this page Overview Installation RivaSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame RivaSegmentedSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Models Usage Examples RivaSTTService RivaSegmentedSTTService Language Support Frame Flow Advanced Configuration Example with Advanced Configuration Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_riva_50f2dac0.txt b/stt_riva_50f2dac0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..38135e77edc486c27d61b61b07dee14cf46f0dfc
--- /dev/null
+++ b/stt_riva_50f2dac0.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/riva#rivasttservice
+Title: NVIDIA Riva - Pipecat
+==================================================
+
+NVIDIA Riva - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text NVIDIA Riva Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview RivaSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. RivaSegmentedSTTService provides speech-to-text capabilities via NVIDIA’s Riva Canary model. ​ Installation To use RivaSTTService or RivaSegmentedSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[riva]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal . ​ RivaSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw PCM audio data 16-bit depth Single channel (mono) ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ RivaSegmentedSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw audio bytes in WAV format ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Models Model Pipecat Class Model Card Link parakeet-ctc-1.1b-asr RivaSTTService NVIDIA Model Card canary-1b-asr RivaSegmentedSTTService NVIDIA Model Card ​ Usage Examples ​ RivaSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ RivaSegmentedSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSegmentedSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSegmentedSTTService( api_key = "your-nvidia-api-key" , params = RivaSegmentedSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support Riva model parakeet-ctc-1.1b-asr (default) primarily supports English with various regional accents: Language Code Description Service Codes Language.EN_US English (US) en-US ​ Frame Flow ​ Advanced Configuration The service supports several advanced configuration options that can be adjusted: ​ _profanity_filter bool default: "False" Filter profanity from transcription ​ _automatic_punctuation bool default: "False" Automatically add punctuation ​ _no_verbatim_transcripts bool default: "False" Whether to disable verbatim transcripts ​ _boosted_lm_words list default: "None" List of words to boost in the language model ​ _boosted_lm_score float default: "4.0" Score applied to boosted words ​ Example with Advanced Configuration Copy Ask AI # Configure service with advanced parameters stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Configure advanced options stt._profanity_filter = True stt._automatic_punctuation = True stt._boosted_lm_words = [ "PipeCat" , "AI" , "speech" ] ​ Notes Uses NVIDIA’s Riva AI Services platform Handles streaming audio input Provides real-time transcription results Manages connection lifecycle Uses asyncio for asynchronous processing Automatically cleans up resources on stop/cancel Groq (Whisper) OpenAI On this page Overview Installation RivaSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame RivaSegmentedSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Models Usage Examples RivaSTTService RivaSegmentedSTTService Language Support Frame Flow Advanced Configuration Example with Advanced Configuration Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_riva_7a50e7e4.txt b/stt_riva_7a50e7e4.txt
new file mode 100644
index 0000000000000000000000000000000000000000..abe359b10207ee4990c9bbd96ff9921b49c897fd
--- /dev/null
+++ b/stt_riva_7a50e7e4.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/riva#usage-examples
+Title: NVIDIA Riva - Pipecat
+==================================================
+
+NVIDIA Riva - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text NVIDIA Riva Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview RivaSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. RivaSegmentedSTTService provides speech-to-text capabilities via NVIDIA’s Riva Canary model. ​ Installation To use RivaSTTService or RivaSegmentedSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[riva]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal . ​ RivaSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw PCM audio data 16-bit depth Single channel (mono) ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ RivaSegmentedSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw audio bytes in WAV format ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Models Model Pipecat Class Model Card Link parakeet-ctc-1.1b-asr RivaSTTService NVIDIA Model Card canary-1b-asr RivaSegmentedSTTService NVIDIA Model Card ​ Usage Examples ​ RivaSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ RivaSegmentedSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSegmentedSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSegmentedSTTService( api_key = "your-nvidia-api-key" , params = RivaSegmentedSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support Riva model parakeet-ctc-1.1b-asr (default) primarily supports English with various regional accents: Language Code Description Service Codes Language.EN_US English (US) en-US ​ Frame Flow ​ Advanced Configuration The service supports several advanced configuration options that can be adjusted: ​ _profanity_filter bool default: "False" Filter profanity from transcription ​ _automatic_punctuation bool default: "False" Automatically add punctuation ​ _no_verbatim_transcripts bool default: "False" Whether to disable verbatim transcripts ​ _boosted_lm_words list default: "None" List of words to boost in the language model ​ _boosted_lm_score float default: "4.0" Score applied to boosted words ​ Example with Advanced Configuration Copy Ask AI # Configure service with advanced parameters stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Configure advanced options stt._profanity_filter = True stt._automatic_punctuation = True stt._boosted_lm_words = [ "PipeCat" , "AI" , "speech" ] ​ Notes Uses NVIDIA’s Riva AI Services platform Handles streaming audio input Provides real-time transcription results Manages connection lifecycle Uses asyncio for asynchronous processing Automatically cleans up resources on stop/cancel Groq (Whisper) OpenAI On this page Overview Installation RivaSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame RivaSegmentedSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Models Usage Examples RivaSTTService RivaSegmentedSTTService Language Support Frame Flow Advanced Configuration Example with Advanced Configuration Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_riva_8ce5677f.txt b/stt_riva_8ce5677f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d5972530b59b1768a11890ad8ff872703e357786
--- /dev/null
+++ b/stt_riva_8ce5677f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/riva#advanced-configuration
+Title: NVIDIA Riva - Pipecat
+==================================================
+
+NVIDIA Riva - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text NVIDIA Riva Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview RivaSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. RivaSegmentedSTTService provides speech-to-text capabilities via NVIDIA’s Riva Canary model. ​ Installation To use RivaSTTService or RivaSegmentedSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[riva]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal . ​ RivaSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw PCM audio data 16-bit depth Single channel (mono) ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ RivaSegmentedSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw audio bytes in WAV format ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Models Model Pipecat Class Model Card Link parakeet-ctc-1.1b-asr RivaSTTService NVIDIA Model Card canary-1b-asr RivaSegmentedSTTService NVIDIA Model Card ​ Usage Examples ​ RivaSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ RivaSegmentedSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSegmentedSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSegmentedSTTService( api_key = "your-nvidia-api-key" , params = RivaSegmentedSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support Riva model parakeet-ctc-1.1b-asr (default) primarily supports English with various regional accents: Language Code Description Service Codes Language.EN_US English (US) en-US ​ Frame Flow ​ Advanced Configuration The service supports several advanced configuration options that can be adjusted: ​ _profanity_filter bool default: "False" Filter profanity from transcription ​ _automatic_punctuation bool default: "False" Automatically add punctuation ​ _no_verbatim_transcripts bool default: "False" Whether to disable verbatim transcripts ​ _boosted_lm_words list default: "None" List of words to boost in the language model ​ _boosted_lm_score float default: "4.0" Score applied to boosted words ​ Example with Advanced Configuration Copy Ask AI # Configure service with advanced parameters stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Configure advanced options stt._profanity_filter = True stt._automatic_punctuation = True stt._boosted_lm_words = [ "PipeCat" , "AI" , "speech" ] ​ Notes Uses NVIDIA’s Riva AI Services platform Handles streaming audio input Provides real-time transcription results Manages connection lifecycle Uses asyncio for asynchronous processing Automatically cleans up resources on stop/cancel Groq (Whisper) OpenAI On this page Overview Installation RivaSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame RivaSegmentedSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Models Usage Examples RivaSTTService RivaSegmentedSTTService Language Support Frame Flow Advanced Configuration Example with Advanced Configuration Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_riva_c8041660.txt b/stt_riva_c8041660.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6e8af3b6e2ed6e502b49d8ff59ab37711b406e02
--- /dev/null
+++ b/stt_riva_c8041660.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/riva#param-params-1
+Title: NVIDIA Riva - Pipecat
+==================================================
+
+NVIDIA Riva - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text NVIDIA Riva Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview RivaSTTService provides real-time speech-to-text capabilities using NVIDIA’s Riva Parakeet model. It supports interim results and configurable recognition parameters for enhanced accuracy. RivaSegmentedSTTService provides speech-to-text capabilities via NVIDIA’s Riva Canary model. ​ Installation To use RivaSTTService or RivaSegmentedSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[riva]" You’ll also need to set up your NVIDIA API key as an environment variable: NVIDIA_API_KEY . You can obtain an NVIDIA API key by signing up through NVIDIA’s developer portal . ​ RivaSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw PCM audio data 16-bit depth Single channel (mono) ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ RivaSegmentedSTTService ​ Configuration ​ api_key str required Your NVIDIA API key ​ server str default: "grpc.nvcf.nvidia.com:443" NVIDIA Riva server address ​ model_function_map Mapping [str, str] A mapping of the NVIDIA function identifier for the STT service with the model name. ​ sample_rate int default: "None" Audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ InputParams ​ language Language default: "Language.EN_US" The language for speech recognition ​ Input The service processes audio frames containing: Raw audio bytes in WAV format ​ Output Frames ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Language used for transcription ​ InterimTranscriptionFrame Generated during ongoing speech, containing same fields as TranscriptionFrame but with preliminary results. ​ Methods See the STT base class methods for additional functionality. ​ Models Model Pipecat Class Model Card Link parakeet-ctc-1.1b-asr RivaSTTService NVIDIA Model Card canary-1b-asr RivaSegmentedSTTService NVIDIA Model Card ​ Usage Examples ​ RivaSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ RivaSegmentedSTTService Copy Ask AI from pipecat.services.riva.stt import RivaSegmentedSTTService from pipecat.transcriptions.language import Language # Configure service stt = RivaSegmentedSTTService( api_key = "your-nvidia-api-key" , params = RivaSegmentedSTTService.InputParams( language = Language. EN_US ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Language Support Riva model parakeet-ctc-1.1b-asr (default) primarily supports English with various regional accents: Language Code Description Service Codes Language.EN_US English (US) en-US ​ Frame Flow ​ Advanced Configuration The service supports several advanced configuration options that can be adjusted: ​ _profanity_filter bool default: "False" Filter profanity from transcription ​ _automatic_punctuation bool default: "False" Automatically add punctuation ​ _no_verbatim_transcripts bool default: "False" Whether to disable verbatim transcripts ​ _boosted_lm_words list default: "None" List of words to boost in the language model ​ _boosted_lm_score float default: "4.0" Score applied to boosted words ​ Example with Advanced Configuration Copy Ask AI # Configure service with advanced parameters stt = RivaSTTService( api_key = "your-nvidia-api-key" , params = RivaSTTService.InputParams( language = Language. EN_US ) ) # Configure advanced options stt._profanity_filter = True stt._automatic_punctuation = True stt._boosted_lm_words = [ "PipeCat" , "AI" , "speech" ] ​ Notes Uses NVIDIA’s Riva AI Services platform Handles streaming audio input Provides real-time transcription results Manages connection lifecycle Uses asyncio for asynchronous processing Automatically cleans up resources on stop/cancel Groq (Whisper) OpenAI On this page Overview Installation RivaSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame RivaSegmentedSTTService Configuration InputParams Input Output Frames TranscriptionFrame InterimTranscriptionFrame Methods Models Usage Examples RivaSTTService RivaSegmentedSTTService Language Support Frame Flow Advanced Configuration Example with Advanced Configuration Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_sambanova_3d61bdc5.txt b/stt_sambanova_3d61bdc5.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4a02d89dc45bdf110866349d4cf3978ea11fb800
--- /dev/null
+++ b/stt_sambanova_3d61bdc5.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/sambanova
+Title: SambaNova (Whisper) - Pipecat
+==================================================
+
+SambaNova (Whisper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text SambaNova (Whisper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SambaNovaSTTService provides speech-to-text capabilities using SambaNova’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. ​ Installation To use SambaNovaSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[sambanova]" You need to set up your SambaNova API key as an environment variable: SAMBANOVA_API_KEY . Get your SambaNova API key here . ​ Configuration ​ Constructor Parameters ​ model str default: "Whisper-Large-v3" Whisper model to use. Currently only “Whisper-Large-v3” is available. ​ api_key str Your SambaNova API key. If not provided, will use environment variable. ​ base_url str default: "https://api.sambanova.ai/v1" Custom API base URL for SambaNova API requests. ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ prompt str Optional text to guide the model’s style or continue a previous segment. ​ temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ Input The service processes audio data with the following requirements: PCM audio format. 16-bit depth. Single channel (mono). ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "Whisper-Large-v3" ) See the STT base class methods for additional functionality. ​ Language Support SambaNova’s Whisper API supports a wide range of languages. The service automatically maps Language enum values to the appropriate Whisper language codes. Language Code Description Whisper Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy SambaNova’s Whisper implementation supports language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. For the most up-to-date list of supported languages, refer to the SambaNova’s docs . ​ Usage Example Copy Ask AI from pipecat.services.sambanova.stt import SambaNovaSTTService from pipecat.transcriptions.language import Language # Configure service stt = SambaNovaSTTService( model = "Whisper-Large-v3" , api_key = "your-sambanova-api-key" , language = Language. EN , prompt = "Transcribe the following conversation" , temperature = 0.0 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise. Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection. Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline. Only sends complete utterances to the API when speech has ended. Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Time to First Byte (TTFB). Processing duration. API response time. ​ Notes Requires valid SambaNova API key. Uses SambaNova’s hosted Whisper model. Requires VAD component in transport. Processes complete utterances, not continuous audio. Handles API rate limiting. Automatic error handling. Thread-safe processing. ​ Error Handling The service handles common API errors including: Authentication errors. Rate limiting. Invalid audio format. Network connectivity issues. API timeouts. Errors are propagated through ErrorFrames with descriptive messages. OpenAI Speechmatics On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_sambanova_c1c12a59.txt b/stt_sambanova_c1c12a59.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d603869acd8a2b3b945d223a3c503d1166fa5eb7
--- /dev/null
+++ b/stt_sambanova_c1c12a59.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/sambanova#voice-activity-detection-integration
+Title: SambaNova (Whisper) - Pipecat
+==================================================
+
+SambaNova (Whisper) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text SambaNova (Whisper) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SambaNovaSTTService provides speech-to-text capabilities using SambaNova’s hosted Whisper API. It offers high-accuracy transcription with minimal setup requirements. The service uses Voice Activity Detection (VAD) to process only speech segments, optimizing API usage and improving response time. ​ Installation To use SambaNovaSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[sambanova]" You need to set up your SambaNova API key as an environment variable: SAMBANOVA_API_KEY . Get your SambaNova API key here . ​ Configuration ​ Constructor Parameters ​ model str default: "Whisper-Large-v3" Whisper model to use. Currently only “Whisper-Large-v3” is available. ​ api_key str Your SambaNova API key. If not provided, will use environment variable. ​ base_url str default: "https://api.sambanova.ai/v1" Custom API base URL for SambaNova API requests. ​ language Language default: "Language.EN" Language of the audio input. Defaults to English. ​ prompt str Optional text to guide the model’s style or continue a previous segment. ​ temperature float Sampling temperature between 0 and 1. Lower values are more deterministic, higher values more creative. Defaults to 0.0. ​ sample_rate int Audio sample rate in Hz. If not provided, uses the pipeline’s sample rate. ​ Input The service processes audio data with the following requirements: PCM audio format. 16-bit depth. Single channel (mono). ​ Output Frames The service produces two types of frames during transcription: ​ TranscriptionFrame Generated for final transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ language Language Detected language (if available) ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Methods ​ Set Model Copy Ask AI await service.set_model( "Whisper-Large-v3" ) See the STT base class methods for additional functionality. ​ Language Support SambaNova’s Whisper API supports a wide range of languages. The service automatically maps Language enum values to the appropriate Whisper language codes. Language Code Description Whisper Code Language.AF Afrikaans af Language.AR Arabic ar Language.HY Armenian hy Language.AZ Azerbaijani az Language.BE Belarusian be Language.BS Bosnian bs Language.BG Bulgarian bg Language.CA Catalan ca Language.ZH Chinese zh Language.HR Croatian hr Language.CS Czech cs Language.DA Danish da Language.NL Dutch nl Language.EN English en Language.ET Estonian et Language.FI Finnish fi Language.FR French fr Language.GL Galician gl Language.DE German de Language.EL Greek el Language.HE Hebrew he Language.HI Hindi hi Language.HU Hungarian hu Language.IS Icelandic is Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KN Kannada kn Language.KK Kazakh kk Language.KO Korean ko Language.LV Latvian lv Language.LT Lithuanian lt Language.MK Macedonian mk Language.MS Malay ms Language.MR Marathi mr Language.MI Maori mi Language.NE Nepali ne Language.NO Norwegian no Language.FA Persian fa Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SR Serbian sr Language.SK Slovak sk Language.SL Slovenian sl Language.ES Spanish es Language.SW Swahili sw Language.SV Swedish sv Language.TL Tagalog tl Language.TA Tamil ta Language.TH Thai th Language.TR Turkish tr Language.UK Ukrainian uk Language.UR Urdu ur Language.VI Vietnamese vi Language.CY Welsh cy SambaNova’s Whisper implementation supports language variants (like en-US , fr-CA ) by mapping them to their base language. For example, Language.EN_US and Language.EN_GB will both map to en . The service will automatically detect the language if none is specified, but specifying the language typically improves transcription accuracy. For the most up-to-date list of supported languages, refer to the SambaNova’s docs . ​ Usage Example Copy Ask AI from pipecat.services.sambanova.stt import SambaNovaSTTService from pipecat.transcriptions.language import Language # Configure service stt = SambaNovaSTTService( model = "Whisper-Large-v3" , api_key = "your-sambanova-api-key" , language = Language. EN , prompt = "Transcribe the following conversation" , temperature = 0.0 ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Voice Activity Detection Integration This service inherits from SegmentedSTTService , which uses Voice Activity Detection (VAD) to identify speech segments for processing. This approach: Processes only actual speech, not silence or background noise. Maintains a small audio buffer (default 1 second) to capture speech that occurs slightly before VAD detection. Receives UserStartedSpeakingFrame and UserStoppedSpeakingFrame from a VAD component in the pipeline. Only sends complete utterances to the API when speech has ended. Ensure your transport includes a VAD component (like SileroVADAnalyzer ) to properly detect speech segments. ​ Metrics Support The service collects the following metrics: Time to First Byte (TTFB). Processing duration. API response time. ​ Notes Requires valid SambaNova API key. Uses SambaNova’s hosted Whisper model. Requires VAD component in transport. Processes complete utterances, not continuous audio. Handles API rate limiting. Automatic error handling. Thread-safe processing. ​ Error Handling The service handles common API errors including: Authentication errors. Rate limiting. Invalid audio format. Network connectivity issues. API timeouts. Errors are propagated through ErrorFrames with descriptive messages. OpenAI Speechmatics On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame ErrorFrame Methods Set Model Language Support Usage Example Voice Activity Detection Integration Metrics Support Notes Error Handling Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_speechmatics_0a03d2c4.txt b/stt_speechmatics_0a03d2c4.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d70834872b29e89d7f98bfe0f691bfc5c5598e81
--- /dev/null
+++ b/stt_speechmatics_0a03d2c4.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/speechmatics#frames
+Title: Speechmatics - Pipecat
+==================================================
+
+Speechmatics - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Speechmatics Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SpeechmaticsSTTService enables real-time speech transcription using Speechmatics’ WebSocket API with partial + final results, speaker diarization, and end of utterance detection (VAD). API Reference Complete API documentation and method details Speechmatics Docs Official Speechmatics documentation and features Speaker Diarization Separating out different speakers in the audio Example Code Working example with interruption handling ​ Installation To use SpeechmaticsSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[speechmatics]" You’ll also need to set up your Speechmatics API key as an environment variable: SPEECHMATICS_API_KEY . Get your API key from the Speechmatics Portal . ​ Frames ​ Input InputAudioRawFrame - Raw PCM audio data (16-bit, 16kHz, mono) STTUpdateSettingsFrame - Runtime transcription configuration updates STTMuteFrame - Mute audio input for transcription ​ Output InterimTranscriptionFrame - Real-time transcription updates TranscriptionFrame - Final transcription results ErrorFrame - Connection or processing errors ​ Endpoints Speechmatics STT supports the following endpoints (defaults to EU2 ): Region Environment STT Endpoint EU EU1 wss://eu1.rt.speechmatics.com/ EU EU2 wss://eu2.rt.speechmatics.com/ US US1 wss://us1.rt.speechmatics.com/ ​ Feature Discovery To check the languages and features supported by Speechmatics STT, you can use the following code: Copy Ask AI curl "https://eu2.rt.speechmatics.com/v1/discovery/features" ​ Language Support Refer to the Speechmatics docs for more information on supported languages. Speechmatics STT supports the following languages and regional variants. Setting a language can be done using the language parameter when creating the STT object. The exception to this is English / Mandarin which has the code cmn_en and must be set using the language_code parameter. Language Code Description Locales Domain Options Language.AR Arabic - - Language.BA Bashkir - - Language.EU Basque - - Language.BE Belarusian - - Language.BG Bulgarian - - Language.BN Bengali - - Language.YUE Cantonese - - Language.CA Catalan - - Language.HR Croatian - - Language.CS Czech - - Language.DA Danish - - Language.NL Dutch - - Language.EN English en-US , en-GB , en-AU finance Language.EO Esperanto - - Language.ET Estonian - - Language.FA Persian - - Language.FI Finnish - - Language.FR French - - Language.GL Galician - - Language.DE German - - Language.EL Greek - - Language.HE Hebrew - - Language.HI Hindi - - Language.HU Hungarian - - Language.IA Interlingua - - Language.IT Italian - - Language.ID Indonesian - - Language.GA Irish - - Language.JA Japanese - - Language.KO Korean - - Language.LV Latvian - - Language.LT Lithuanian - - Language.MS Malay - - Language.MT Maltese - - Language.CMN Mandarin cmn-Hans , cmn-Hant - cmn_en English / Mandarin - - Language.MR Marathi - - Language.MN Mongolian - - Language.NO Norwegian - - Language.PL Polish - - Language.PT Portuguese - - Language.RO Romanian - - Language.RU Russian - - Language.SK Slovakian - - Language.SL Slovenian - - Language.ES Spanish - bilingual-en Language.SV Swedish - - Language.SW Swahili - - Language.TA Tamil - - Language.TH Thai - - Language.TR Turkish - - Language.UG Uyghur - - Language.UK Ukrainian - - Language.UR Urdu - - Language.VI Vietnamese - - Language.CY Welsh - - ​ Translation Support Speechmatics supports the translation of transcribed output into the following languages: Language Code Description Language.BG Bulgarian Language.CA Catalan Language.CMN Mandarin Language.CS Czech Language.DA Danish Language.DE German Language.EL Greek Language.EN English Language.ES Spanish Language.ET Estonian Language.FI Finnish Language.FR French Language.GL Galician Language.HI Hindi Language.HR Croatian Language.HU Hungarian Language.ID Indonesian Language.IT Italian Language.JA Japanese Language.KO Korean Language.LT Lithuanian Language.LV Latvian Language.MS Malay Language.NL Dutch Language.NO Norwegian Language.PL Polish Language.PT Portuguese Language.RO Romanian Language.RU Russian Language.SK Slovakian Language.SL Slovenian Language.SV Swedish Language.TR Turkish Language.UK Ukrainian Language.VI Vietnamese ​ Speaker Diarization Speechmatics STT supports speaker diarization, which separates out different speakers in the audio. The identity of each speaker is returned in the TranscriptionFrame objects in the user_id attribute. To enable this feature, set enable_speaker_diarization to True . Additionally, if a text_format is provided, then the text output for the TranscriptionFrame will be formatted to this specification. Your system context can then be updated to include information about this format to understand which speaker spoke which words. For example, if you have text_format = <{speaker_id}>{text}</{speaker_id}> , then the output would be <S1>Good morning.</S1> . ​ Available attributes Attribute Description Example speaker_id The ID of the speaker S1 text The transcribed text Good morning. ​ Usage Example Copy Ask AI from pipecat.services.speechmatics.stt import SpeechmaticsSTTService from pipecat.transcriptions.language import Language # Configure service stt = SpeechmaticsSTTService( api_key = "your-api-key" , language = Language. FR , ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, tts, transport.output() ]) ​ Additional Notes Connection Management : Automatically handles WebSocket connections and reconnections Sample Rate : The default sample rate of 16000 in pcm_s16le format VAD Integration : Supports Speechmatics’ built-in VAD and end of utterance detection SambaNova (Whisper) Ultravox On this page Overview Installation Frames Input Output Endpoints Feature Discovery Language Support Translation Support Speaker Diarization Available attributes Usage Example Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_ultravox_364c400a.txt b/stt_ultravox_364c400a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1bfb0f6a4462542a9b960ec8549a4bfa213dd9a7
--- /dev/null
+++ b/stt_ultravox_364c400a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/ultravox#internal-architecture
+Title: Ultravox - Pipecat
+==================================================
+
+Ultravox - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Ultravox Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview UltravoxSTTService provides real-time speech-to-text capabilities using the Ultravox multimodal model running locally. Ultravox directly encodes audio into the LLM’s embedding space, eliminating the need for a separate ASR component and providing faster, more efficient transcription. ​ Installation To use UltravoxSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[ultravox]" ​ Configuration ​ Constructor Parameters ​ model_size str default: "fixie-ai/ultravox-v0_4_1-llama-3_1-8b" The Ultravox model to use. Defaults to the 8B parameter model based on Llama 3.1. ​ hf_token str Your Hugging Face token for accessing the model. Will use the HF_TOKEN environment variable if not provided. ​ temperature float default: "0.7" Temperature for text generation, controlling creativity vs. determinism. ​ max_tokens int default: "100" Maximum number of tokens to generate for each transcription. ​ Input The service processes AudioRawFrame instances containing: Raw PCM audio data 16-bit depth (int16) 16kHz sample rate (recommended) Single channel (mono) The service intelligently buffers audio between UserStartedSpeakingFrame and UserStoppedSpeakingFrame events. ​ Output Frames The service produces the following frame during transcription: ​ TranscriptionFrame Generated when speech processing is complete, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ Usage Example Copy Ask AI from pipecat.services.ultravox.stt import UltravoxSTTService import os # Configure service stt = UltravoxSTTService( model_size = "fixie-ai/ultravox-v0_4_1-llama-3_1-8b" , hf_token = os.environ.get( "HF_TOKEN" ), temperature = 0.5 , max_tokens = 150 , ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Internal Architecture UltravoxSTTService uses the following internal components: ​ AudioBuffer Collects audio frames during speech events for batch processing: Manages frames between UserStartedSpeakingFrame and UserStoppedSpeakingFrame Tracks speech start timestamp Prevents concurrent processing with status flag ​ UltravoxModel Handles loading and running the Ultravox model: Initializes vLLM engine for optimal GPU inference Manages model tokenization and prompt formatting Provides streaming text generation from audio input ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Total processing duration Buffer collection time ​ Error Handling The service produces ErrorFrames in the following scenarios: Empty audio buffer Invalid audio data Model inference errors Timeout during processing ​ Available Models Model ID Parameters Base LLM Min VRAM Languages fixie-ai/ultravox-v0_4_1-llama-3_1-8b 8B Llama 3.1 16GB English, limited multilingual fixie-ai/ultravox-v0_4_1-llama-3_1-70b 70B Llama 3.1 80GB+ English, broader multilingual fixie-ai/ultravox-v0_4-mistral-7b 7B Mistral 14GB English See the Fixie.ai Hugging Face page for the latest model availability. Ensure your environment has sufficient GPU resources for the selected model. ​ Memory Considerations The Ultravox model requires GPU resources: For the 8B parameter model ( fixie-ai/ultravox-v0_4_1-llama-3_1-8b ): Benchmarked on A100-40GB GPUs Can run on consumer GPUs with appropriate memory (at least 8GB VRAM, but performance may vary) For larger variants like the 70B parameter model: Requires significantly more memory (40GB+ VRAM recommended) Audio buffer size grows with speech duration Processing occurs in batches, not streaming According to the Hugging Face card, the 8B model achieves a time-to-first-token of approximately 150ms and generates 50-100 tokens per second on an A100-40GB GPU. Speechmatics Whisper On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame Usage Example Internal Architecture AudioBuffer UltravoxModel Frame Flow Metrics Support Error Handling Available Models Memory Considerations Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_ultravox_7bd46674.txt b/stt_ultravox_7bd46674.txt
new file mode 100644
index 0000000000000000000000000000000000000000..72abb76e6a2d13a762e1c19d6a7c163b12697e4d
--- /dev/null
+++ b/stt_ultravox_7bd46674.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/ultravox#ultravoxmodel
+Title: Ultravox - Pipecat
+==================================================
+
+Ultravox - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Ultravox Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview UltravoxSTTService provides real-time speech-to-text capabilities using the Ultravox multimodal model running locally. Ultravox directly encodes audio into the LLM’s embedding space, eliminating the need for a separate ASR component and providing faster, more efficient transcription. ​ Installation To use UltravoxSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[ultravox]" ​ Configuration ​ Constructor Parameters ​ model_size str default: "fixie-ai/ultravox-v0_4_1-llama-3_1-8b" The Ultravox model to use. Defaults to the 8B parameter model based on Llama 3.1. ​ hf_token str Your Hugging Face token for accessing the model. Will use the HF_TOKEN environment variable if not provided. ​ temperature float default: "0.7" Temperature for text generation, controlling creativity vs. determinism. ​ max_tokens int default: "100" Maximum number of tokens to generate for each transcription. ​ Input The service processes AudioRawFrame instances containing: Raw PCM audio data 16-bit depth (int16) 16kHz sample rate (recommended) Single channel (mono) The service intelligently buffers audio between UserStartedSpeakingFrame and UserStoppedSpeakingFrame events. ​ Output Frames The service produces the following frame during transcription: ​ TranscriptionFrame Generated when speech processing is complete, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ Usage Example Copy Ask AI from pipecat.services.ultravox.stt import UltravoxSTTService import os # Configure service stt = UltravoxSTTService( model_size = "fixie-ai/ultravox-v0_4_1-llama-3_1-8b" , hf_token = os.environ.get( "HF_TOKEN" ), temperature = 0.5 , max_tokens = 150 , ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Internal Architecture UltravoxSTTService uses the following internal components: ​ AudioBuffer Collects audio frames during speech events for batch processing: Manages frames between UserStartedSpeakingFrame and UserStoppedSpeakingFrame Tracks speech start timestamp Prevents concurrent processing with status flag ​ UltravoxModel Handles loading and running the Ultravox model: Initializes vLLM engine for optimal GPU inference Manages model tokenization and prompt formatting Provides streaming text generation from audio input ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Total processing duration Buffer collection time ​ Error Handling The service produces ErrorFrames in the following scenarios: Empty audio buffer Invalid audio data Model inference errors Timeout during processing ​ Available Models Model ID Parameters Base LLM Min VRAM Languages fixie-ai/ultravox-v0_4_1-llama-3_1-8b 8B Llama 3.1 16GB English, limited multilingual fixie-ai/ultravox-v0_4_1-llama-3_1-70b 70B Llama 3.1 80GB+ English, broader multilingual fixie-ai/ultravox-v0_4-mistral-7b 7B Mistral 14GB English See the Fixie.ai Hugging Face page for the latest model availability. Ensure your environment has sufficient GPU resources for the selected model. ​ Memory Considerations The Ultravox model requires GPU resources: For the 8B parameter model ( fixie-ai/ultravox-v0_4_1-llama-3_1-8b ): Benchmarked on A100-40GB GPUs Can run on consumer GPUs with appropriate memory (at least 8GB VRAM, but performance may vary) For larger variants like the 70B parameter model: Requires significantly more memory (40GB+ VRAM recommended) Audio buffer size grows with speech duration Processing occurs in batches, not streaming According to the Hugging Face card, the 8B model achieves a time-to-first-token of approximately 150ms and generates 50-100 tokens per second on an A100-40GB GPU. Speechmatics Whisper On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame Usage Example Internal Architecture AudioBuffer UltravoxModel Frame Flow Metrics Support Error Handling Available Models Memory Considerations Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_ultravox_ccf2e35b.txt b/stt_ultravox_ccf2e35b.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d28fe05494841722642782dcf7b5818ec09e9f0d
--- /dev/null
+++ b/stt_ultravox_ccf2e35b.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/ultravox#transcriptionframe
+Title: Ultravox - Pipecat
+==================================================
+
+Ultravox - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Ultravox Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview UltravoxSTTService provides real-time speech-to-text capabilities using the Ultravox multimodal model running locally. Ultravox directly encodes audio into the LLM’s embedding space, eliminating the need for a separate ASR component and providing faster, more efficient transcription. ​ Installation To use UltravoxSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[ultravox]" ​ Configuration ​ Constructor Parameters ​ model_size str default: "fixie-ai/ultravox-v0_4_1-llama-3_1-8b" The Ultravox model to use. Defaults to the 8B parameter model based on Llama 3.1. ​ hf_token str Your Hugging Face token for accessing the model. Will use the HF_TOKEN environment variable if not provided. ​ temperature float default: "0.7" Temperature for text generation, controlling creativity vs. determinism. ​ max_tokens int default: "100" Maximum number of tokens to generate for each transcription. ​ Input The service processes AudioRawFrame instances containing: Raw PCM audio data 16-bit depth (int16) 16kHz sample rate (recommended) Single channel (mono) The service intelligently buffers audio between UserStartedSpeakingFrame and UserStoppedSpeakingFrame events. ​ Output Frames The service produces the following frame during transcription: ​ TranscriptionFrame Generated when speech processing is complete, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ Usage Example Copy Ask AI from pipecat.services.ultravox.stt import UltravoxSTTService import os # Configure service stt = UltravoxSTTService( model_size = "fixie-ai/ultravox-v0_4_1-llama-3_1-8b" , hf_token = os.environ.get( "HF_TOKEN" ), temperature = 0.5 , max_tokens = 150 , ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Internal Architecture UltravoxSTTService uses the following internal components: ​ AudioBuffer Collects audio frames during speech events for batch processing: Manages frames between UserStartedSpeakingFrame and UserStoppedSpeakingFrame Tracks speech start timestamp Prevents concurrent processing with status flag ​ UltravoxModel Handles loading and running the Ultravox model: Initializes vLLM engine for optimal GPU inference Manages model tokenization and prompt formatting Provides streaming text generation from audio input ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Total processing duration Buffer collection time ​ Error Handling The service produces ErrorFrames in the following scenarios: Empty audio buffer Invalid audio data Model inference errors Timeout during processing ​ Available Models Model ID Parameters Base LLM Min VRAM Languages fixie-ai/ultravox-v0_4_1-llama-3_1-8b 8B Llama 3.1 16GB English, limited multilingual fixie-ai/ultravox-v0_4_1-llama-3_1-70b 70B Llama 3.1 80GB+ English, broader multilingual fixie-ai/ultravox-v0_4-mistral-7b 7B Mistral 14GB English See the Fixie.ai Hugging Face page for the latest model availability. Ensure your environment has sufficient GPU resources for the selected model. ​ Memory Considerations The Ultravox model requires GPU resources: For the 8B parameter model ( fixie-ai/ultravox-v0_4_1-llama-3_1-8b ): Benchmarked on A100-40GB GPUs Can run on consumer GPUs with appropriate memory (at least 8GB VRAM, but performance may vary) For larger variants like the 70B parameter model: Requires significantly more memory (40GB+ VRAM recommended) Audio buffer size grows with speech duration Processing occurs in batches, not streaming According to the Hugging Face card, the 8B model achieves a time-to-first-token of approximately 150ms and generates 50-100 tokens per second on an A100-40GB GPU. Speechmatics Whisper On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame Usage Example Internal Architecture AudioBuffer UltravoxModel Frame Flow Metrics Support Error Handling Available Models Memory Considerations Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_ultravox_e666ea0d.txt b/stt_ultravox_e666ea0d.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cb4d33c55ded095a7454af3c6cd804b7fccf7b74
--- /dev/null
+++ b/stt_ultravox_e666ea0d.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/ultravox#param-timestamp
+Title: Ultravox - Pipecat
+==================================================
+
+Ultravox - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Ultravox Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview UltravoxSTTService provides real-time speech-to-text capabilities using the Ultravox multimodal model running locally. Ultravox directly encodes audio into the LLM’s embedding space, eliminating the need for a separate ASR component and providing faster, more efficient transcription. ​ Installation To use UltravoxSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[ultravox]" ​ Configuration ​ Constructor Parameters ​ model_size str default: "fixie-ai/ultravox-v0_4_1-llama-3_1-8b" The Ultravox model to use. Defaults to the 8B parameter model based on Llama 3.1. ​ hf_token str Your Hugging Face token for accessing the model. Will use the HF_TOKEN environment variable if not provided. ​ temperature float default: "0.7" Temperature for text generation, controlling creativity vs. determinism. ​ max_tokens int default: "100" Maximum number of tokens to generate for each transcription. ​ Input The service processes AudioRawFrame instances containing: Raw PCM audio data 16-bit depth (int16) 16kHz sample rate (recommended) Single channel (mono) The service intelligently buffers audio between UserStartedSpeakingFrame and UserStoppedSpeakingFrame events. ​ Output Frames The service produces the following frame during transcription: ​ TranscriptionFrame Generated when speech processing is complete, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ Usage Example Copy Ask AI from pipecat.services.ultravox.stt import UltravoxSTTService import os # Configure service stt = UltravoxSTTService( model_size = "fixie-ai/ultravox-v0_4_1-llama-3_1-8b" , hf_token = os.environ.get( "HF_TOKEN" ), temperature = 0.5 , max_tokens = 150 , ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Internal Architecture UltravoxSTTService uses the following internal components: ​ AudioBuffer Collects audio frames during speech events for batch processing: Manages frames between UserStartedSpeakingFrame and UserStoppedSpeakingFrame Tracks speech start timestamp Prevents concurrent processing with status flag ​ UltravoxModel Handles loading and running the Ultravox model: Initializes vLLM engine for optimal GPU inference Manages model tokenization and prompt formatting Provides streaming text generation from audio input ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Total processing duration Buffer collection time ​ Error Handling The service produces ErrorFrames in the following scenarios: Empty audio buffer Invalid audio data Model inference errors Timeout during processing ​ Available Models Model ID Parameters Base LLM Min VRAM Languages fixie-ai/ultravox-v0_4_1-llama-3_1-8b 8B Llama 3.1 16GB English, limited multilingual fixie-ai/ultravox-v0_4_1-llama-3_1-70b 70B Llama 3.1 80GB+ English, broader multilingual fixie-ai/ultravox-v0_4-mistral-7b 7B Mistral 14GB English See the Fixie.ai Hugging Face page for the latest model availability. Ensure your environment has sufficient GPU resources for the selected model. ​ Memory Considerations The Ultravox model requires GPU resources: For the 8B parameter model ( fixie-ai/ultravox-v0_4_1-llama-3_1-8b ): Benchmarked on A100-40GB GPUs Can run on consumer GPUs with appropriate memory (at least 8GB VRAM, but performance may vary) For larger variants like the 70B parameter model: Requires significantly more memory (40GB+ VRAM recommended) Audio buffer size grows with speech duration Processing occurs in batches, not streaming According to the Hugging Face card, the 8B model achieves a time-to-first-token of approximately 150ms and generates 50-100 tokens per second on an A100-40GB GPU. Speechmatics Whisper On this page Overview Installation Configuration Constructor Parameters Input Output Frames TranscriptionFrame Usage Example Internal Architecture AudioBuffer UltravoxModel Frame Flow Metrics Support Error Handling Available Models Memory Considerations Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_whisper_61d26818.txt b/stt_whisper_61d26818.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0760b713abdee1f926f487f87e91b306465d0f62
--- /dev/null
+++ b/stt_whisper_61d26818.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/whisper#usage-example
+Title: Whisper - Pipecat
+==================================================
+
+Whisper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Whisper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview WhisperSTTService provides speech-to-text capabilities using OpenAI’s Whisper models running locally. It supports multiple model sizes and configurations for offline transcription. ​ Installation To use WhisperSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[whisper]" ​ Configuration ​ Constructor Parameters ​ model str | Model default: "Model.DISTIL_MEDIUM_EN" Whisper model to use. Can be a string or Model enum value ​ device str default: "auto" Device to run the model on (‘cpu’, ‘cuda’, or ‘auto’) ​ compute_type str default: "default" Computation type for model inference ​ no_speech_prob float default: "0.4" Threshold for filtering out non-speech segments ​ Available Models Copy Ask AI class Model ( Enum ): TINY = "tiny" # Smallest, fastest model BASE = "base" # Basic model MEDIUM = "medium" # Medium-sized model LARGE = "large-v3" # Largest, most accurate model DISTIL_LARGE_V2 = "Systran/faster-distil-whisper-large-v2" DISTIL_MEDIUM_EN = "Systran/faster-distil-whisper-medium.en" ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) Normalized to float32 range [-1.0, 1.0] ​ Output Frames ​ TranscriptionFrame Generated for transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Usage Example Copy Ask AI from pipecat.services.whisper.stt import WhisperSTTService, Model # Configure service with default model stt = WhisperSTTService( model = Model. DISTIL_MEDIUM_EN , device = "cuda" , no_speech_prob = 0.4 ) # Or use a custom model path stt = WhisperSTTService( model = "path/to/custom/model" , device = "cpu" ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Methods See the STT base class methods for additional functionality. ​ Model Selection Guide Model Size Speed Accuracy Memory Usage TINY 39M Fastest Basic Minimal BASE 74M Fast Good Low MEDIUM 769M Medium Better Moderate LARGE 1.5GB Slow Best High DISTIL_MEDIUM_EN ~400M Fast Good (English) Moderate DISTIL_LARGE_V2 ~750M Medium Better Moderate ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Model loading time Inference time ​ Notes Runs completely offline after model download First run requires model download Supports CPU and CUDA acceleration Processes audio in segments Filters out non-speech segments Thread-safe processing Automatic error handling Ultravox Anthropic On this page Overview Installation Configuration Constructor Parameters Available Models Input Output Frames TranscriptionFrame ErrorFrame Usage Example Methods Model Selection Guide Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_whisper_aa8114d9.txt b/stt_whisper_aa8114d9.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e79948fef8416e9b0213d134eadfe773d6663fa3
--- /dev/null
+++ b/stt_whisper_aa8114d9.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/whisper#metrics-support
+Title: Whisper - Pipecat
+==================================================
+
+Whisper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Whisper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview WhisperSTTService provides speech-to-text capabilities using OpenAI’s Whisper models running locally. It supports multiple model sizes and configurations for offline transcription. ​ Installation To use WhisperSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[whisper]" ​ Configuration ​ Constructor Parameters ​ model str | Model default: "Model.DISTIL_MEDIUM_EN" Whisper model to use. Can be a string or Model enum value ​ device str default: "auto" Device to run the model on (‘cpu’, ‘cuda’, or ‘auto’) ​ compute_type str default: "default" Computation type for model inference ​ no_speech_prob float default: "0.4" Threshold for filtering out non-speech segments ​ Available Models Copy Ask AI class Model ( Enum ): TINY = "tiny" # Smallest, fastest model BASE = "base" # Basic model MEDIUM = "medium" # Medium-sized model LARGE = "large-v3" # Largest, most accurate model DISTIL_LARGE_V2 = "Systran/faster-distil-whisper-large-v2" DISTIL_MEDIUM_EN = "Systran/faster-distil-whisper-medium.en" ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) Normalized to float32 range [-1.0, 1.0] ​ Output Frames ​ TranscriptionFrame Generated for transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Usage Example Copy Ask AI from pipecat.services.whisper.stt import WhisperSTTService, Model # Configure service with default model stt = WhisperSTTService( model = Model. DISTIL_MEDIUM_EN , device = "cuda" , no_speech_prob = 0.4 ) # Or use a custom model path stt = WhisperSTTService( model = "path/to/custom/model" , device = "cpu" ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Methods See the STT base class methods for additional functionality. ​ Model Selection Guide Model Size Speed Accuracy Memory Usage TINY 39M Fastest Basic Minimal BASE 74M Fast Good Low MEDIUM 769M Medium Better Moderate LARGE 1.5GB Slow Best High DISTIL_MEDIUM_EN ~400M Fast Good (English) Moderate DISTIL_LARGE_V2 ~750M Medium Better Moderate ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Model loading time Inference time ​ Notes Runs completely offline after model download First run requires model download Supports CPU and CUDA acceleration Processes audio in segments Filters out non-speech segments Thread-safe processing Automatic error handling Ultravox Anthropic On this page Overview Installation Configuration Constructor Parameters Available Models Input Output Frames TranscriptionFrame ErrorFrame Usage Example Methods Model Selection Guide Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/stt_whisper_cb46d3be.txt b/stt_whisper_cb46d3be.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cf90ede69b6c55e5787b1c1369bfa1bcff8f05d7
--- /dev/null
+++ b/stt_whisper_cb46d3be.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/stt/whisper#transcriptionframe
+Title: Whisper - Pipecat
+==================================================
+
+Whisper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Speech-to-Text Whisper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text AssemblyAI AWS Transcribe Azure Cartesia Deepgram Fal (Wizper) Gladia Google Groq (Whisper) NVIDIA Riva OpenAI SambaNova (Whisper) Speechmatics Ultravox Whisper LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview WhisperSTTService provides speech-to-text capabilities using OpenAI’s Whisper models running locally. It supports multiple model sizes and configurations for offline transcription. ​ Installation To use WhisperSTTService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[whisper]" ​ Configuration ​ Constructor Parameters ​ model str | Model default: "Model.DISTIL_MEDIUM_EN" Whisper model to use. Can be a string or Model enum value ​ device str default: "auto" Device to run the model on (‘cpu’, ‘cuda’, or ‘auto’) ​ compute_type str default: "default" Computation type for model inference ​ no_speech_prob float default: "0.4" Threshold for filtering out non-speech segments ​ Available Models Copy Ask AI class Model ( Enum ): TINY = "tiny" # Smallest, fastest model BASE = "base" # Basic model MEDIUM = "medium" # Medium-sized model LARGE = "large-v3" # Largest, most accurate model DISTIL_LARGE_V2 = "Systran/faster-distil-whisper-large-v2" DISTIL_MEDIUM_EN = "Systran/faster-distil-whisper-medium.en" ​ Input The service processes raw audio data with the following requirements: PCM audio format 16-bit depth Single channel (mono) Normalized to float32 range [-1.0, 1.0] ​ Output Frames ​ TranscriptionFrame Generated for transcriptions, containing: ​ text string Transcribed text ​ user_id string User identifier ​ timestamp string ISO 8601 formatted timestamp ​ ErrorFrame Generated when transcription errors occur, containing error details. ​ Usage Example Copy Ask AI from pipecat.services.whisper.stt import WhisperSTTService, Model # Configure service with default model stt = WhisperSTTService( model = Model. DISTIL_MEDIUM_EN , device = "cuda" , no_speech_prob = 0.4 ) # Or use a custom model path stt = WhisperSTTService( model = "path/to/custom/model" , device = "cpu" ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, ... ]) ​ Methods See the STT base class methods for additional functionality. ​ Model Selection Guide Model Size Speed Accuracy Memory Usage TINY 39M Fastest Basic Minimal BASE 74M Fast Good Low MEDIUM 769M Medium Better Moderate LARGE 1.5GB Slow Best High DISTIL_MEDIUM_EN ~400M Fast Good (English) Moderate DISTIL_LARGE_V2 ~750M Medium Better Moderate ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Model loading time Inference time ​ Notes Runs completely offline after model download First run requires model download Supports CPU and CUDA acceleration Processes audio in segments Filters out non-speech segments Thread-safe processing Automatic error handling Ultravox Anthropic On this page Overview Installation Configuration Constructor Parameters Available Models Input Output Frames TranscriptionFrame ErrorFrame Usage Example Methods Model Selection Guide Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/telephony_daily-webrtc_61e8c451.txt b/telephony_daily-webrtc_61e8c451.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9a21b66cd6a7566f555afb4734c090df11e28f3d
--- /dev/null
+++ b/telephony_daily-webrtc_61e8c451.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/telephony/daily-webrtc#purchasing-a-phone-number
+Title: Dial-in: WebRTC (Daily) - Pipecat
+==================================================
+
+Dial-in: WebRTC (Daily) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Telephony Dial-in: WebRTC (Daily) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Things you’ll need An active Daily developer key. One or more Daily provisioned phone numbers (covered below). Prefer to look at code? See the example project! We have a complete dialin-ready project using Daily as both a transport and PSTN/SIP provider in the Pipecat repo. This guide referencse the project and steps through the important parts that make dial-in work. ​ Do I need to provision my phone numbers through Daily? You can use Daily solely as a transport if you prefer. This is particularly useful if you already have Twilio-provisioned numbers and workflows. In that case, you can configure Twilio to forward calls to your Pipecat agents and join a Daily WebRTC call. More details on using Twilio with Daily as a transport can be found here . If you’re starting from scratch, using everything on one platform offers some convenience. By provisioning your phone numbers through Daily and using Daily as the transport layer, you won’t need to worry about initial call routing. ​ Purchasing a phone number You can purchase a number via the Daily REST API Purchase a random number Purchase specific number List numbers Copy Ask AI curl --request POST \ --url 'https://api.daily.co/v1/buy-phone-number' \ --header 'Authorization: Bearer [YOUR_DAILY_API_KEY]' \ --header 'Content-Type: application/json' ​ Configuring your bot runner You’ll need a HTTP service that can receive incoming call hooks and trigger a new agent session. We discussed the concept of a bot runner in the deployment section, which we’ll build on here to add support for incoming phone calls. Within the start_bot method, we’ll need to grab both callId and callDomain from the incoming web request that is triggered by Daily when someone dials the number: bot_daily.py Copy Ask AI # Get the dial-in properties from the request try : data = await request.json() callId = data.get( "callId" ) callDomain = data.get( "callDomain" ) except Exception : raise HTTPException( status_code = 500 , detail = "Missing properties 'callId' or 'callDomain'" ) Full bot source code here ​ Orchestrating incoming calls Daily needs a URL / webhook endpoint it can trigger when a user dials the phone number. We can configure this by assigning the number to an endpoint via their REST API. Here is an example: Copy Ask AI curl --location 'https://api.daily.co/v1' \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer [DAILY API TOKEN HERE]' \ --data '{ "properties": { "pinless_dialin": [ { "phone_number": "[DAILY PROVISIONED NUMBER HERE]", "room_creation_api": "[BOT RUNNER URL]/start_bot" } ] } }' If you want to test locally, you can expose your web method using a service such as ngrok . Example ngrok tunnel Copy Ask AI python bot_runner.py --host localhost --port 7860 --reload ngrok http localhost:7860 # E.g: https://123.ngrok.app/start_bot ​ Creating a new SIP-enabled room We’ll need to configure the Daily room to be setup to receive SIP connections. daily-helpers.py included in Pipecat has some useful imports that make this easy. We just need to pass through new SIP parameters as part of room creation: bot_runner.py Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomParams, DailyRoomProperties, DailyRoomSipParams params = DailyRoomParams( properties = DailyRoomProperties( sip = DailyRoomSipParams( display_name = "sip-dialin" video = False sip_mode = "dial-in" num_endpoints = 1 ) ) ) # Create sip-enabled Daily room via REST try : room: DailyRoomObject = daily_rest_helper.create_room( params = params) except Exception as e: raise HTTPException( status_code = 500 , detail = f "Unable to provision room { e } " ) print ( f "Daily room returned { room.url } { room.config.sip_endpoint } " ) Incoming calls will include both callId and callDomain properties in the body of the request; we’ll need to pass to the Pipecat agent. For simplicity, our agents are spawned as sub-processes of the bot runner, so we’ll pass the callId and callDomain through as command line arguments: bot_runner.py Copy Ask AI proc = subprocess.Popen( [ f "python3 -m bot_daily -u { room.url } -t { token } -i { callId } -d { callDomain } " ], shell = True , bufsize = 1 , cwd = os.path.dirname(os.path.abspath( __file__ )) ) That’s all the configuration we need in our bot_runner.py . ​ Configuring your Pipecat bot Let’s take a look at bot_daily.py and step through the differences from other examples. First, it’s setup to receive additional command line parameters which are passed through to the DailyTransport object: bot_daily.py Copy Ask AI # ... async def main ( room_url : str , token : str , callId : str , callDomain : str ): async with aiohttp.ClientSession() as session: diallin_settings = DailyDialinSettings( call_id = callId, call_domain = callDomain ) transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( api_url = daily_api_url, api_key = daily_api_key, dialin_settings = diallin_settings, audio_in_enabled = True , audio_out_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ) ) # ... your bot code if __name__ == "__main__" : parser = argparse.ArgumentParser( description = "Pipecat Simple ChatBot" ) parser.add_argument( "-u" , type = str , help = "Room URL" ) parser.add_argument( "-t" , type = str , help = "Token" ) parser.add_argument( "-i" , type = str , help = "Call ID" ) parser.add_argument( "-d" , type = str , help = "Call Domain" ) config = parser.parse_args() asyncio.run(main(config.u, config.t, config.i, config.d)) Optionally, we can listen and respond to the on_dialin_ready event manually. This is useful if you have specific scenarios in whih you want to indicates that the SIP worker and is ready to be forwarded to the call. This would stop any hold music and connect the end-user to our Pipecat bot. Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , cdata ): print ( f "on_dialin_ready" , cdata) Since we’re using Daily as a phone vendor, this method is handled internally by the Pipecat Daily service. It can, however, be useful to override this default behaviour if you want to configure your bot in a certain way as soon as the call is ready. Typically, however, initial setup is done in the on_first_participant_joined event after the user has joined the session. Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(messages)]) Overview Dial-in: WebRTC (Twilio + Daily) On this page Things you’ll need Do I need to provision my phone numbers through Daily? Purchasing a phone number Configuring your bot runner Orchestrating incoming calls Creating a new SIP-enabled room Configuring your Pipecat bot Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/telephony_overview_2299fd8a.txt b/telephony_overview_2299fd8a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..00057121903dae2a8a6cb6a11d47f0bafb68c647
--- /dev/null
+++ b/telephony_overview_2299fd8a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/telephony/overview
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Telephony Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Introduction You can dial-in to your Pipecat bots, and have them dial-out too, across both PSTN and SIP. The technical implementation will depend on your chosen transport and phone number vendor; each will likely have their own methods and events to consider. ​ Which transport should I use? This really depends on your project. We have examples that cover both WebRTC (Daily) and Twilio (WebSockets), and Pipecat supports multiple different types of media transport: local, WebSockets, WebRTC etc. Use Pipecat’s native Twilio WebSockets integration for simple workflows that are only telephony-based. The call is managed by Twilio (or similar telephony provider), which means that the bot is not able to perform any form of complex call control. Typically Twilio-specific APIs need to be implemented (for example, when you’re already using Twilio Studio, Twilio Flex, etc). We strongly recommend against using WebSockets for non-telephony use cases (mobile apps, web browsers, etc.). See below. You must use SIP for use cases like the below. These require SIP-based call control: Multi-agents or multi-party calls Connect to legacy call centers powered by open source or cloud Forwarding calls, agent assist/co-pilot, warm transfers, etc. Supporting different telephony vendors without having telephony platform-specific code We strongly recommend using WebRTC for non telephony use cases — ie, mobile apps, web-based experiences. WebRTC is designed to support users on devices with varying network conditions at scale. Learn more in the Voice AI & Voice Agents Illustrated Primer here Please note: you can configure your Pipecat bots to handle multiple vendors. You could,for example, use both Daily and Twilio as phone number vendors concurrently. ​ What are PSTN and SIP? What are the differences? PSTN is an abbreviation for traditional phone networks, consisting of the physical phone lines, cables and transmission links. One of the main differences to consider between these two forms of telephony is PSTN operates on a one user per line basis while SIP can have multiple users per line. Depending on your use-case, you may or may not want to have a single phone number that routes users to a specific bot session, something we’ll cover in the following guides. Pipecat Flows Dial-in: WebRTC (Daily) On this page Introduction Which transport should I use? What are PSTN and SIP? What are the differences? Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/telephony_twilio-daily-webrtc_5a95115b.txt b/telephony_twilio-daily-webrtc_5a95115b.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d1a0e5268281609838b1239eba4252c1fd586f38
--- /dev/null
+++ b/telephony_twilio-daily-webrtc_5a95115b.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/telephony/twilio-daily-webrtc#next-steps
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/telephony_twilio-daily-webrtc_7fa7f9f4.txt b/telephony_twilio-daily-webrtc_7fa7f9f4.txt
new file mode 100644
index 0000000000000000000000000000000000000000..55b108cfe3a28fc5ab68cd315acf2eec1347ecc8
--- /dev/null
+++ b/telephony_twilio-daily-webrtc_7fa7f9f4.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/telephony/twilio-daily-webrtc#things-you%E2%80%99ll-need
+Title: Dial-in: WebRTC (Twilio + Daily) - Pipecat
+==================================================
+
+Dial-in: WebRTC (Twilio + Daily) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Telephony Dial-in: WebRTC (Twilio + Daily) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Things you’ll need An active Twilio developer key One or more Twilio provisioned phone numbers (covered below) The Twilio Python client library ( pip install twilio ) Prefer to look at code? See the example project! We have a complete example project using Daily as a transport with Twilio as a phone provider. This guide walks through the important components and best practices when integrating voice calls. ​ How It Works Here’s the sequence of events when someone calls your Twilio number: Twilio receives an incoming call to your phone number Twilio calls your webhook server ( /call endpoint) Your server creates a Daily room with SIP capabilities Your server starts the bot process with the room details Your server responds to Twilio with TwiML that puts the caller on hold with music Upon receiving the on_dialin_ready event, the bot forwards the call to the Daily SIP endpoint The caller and bot are connected, and the bot handles the conversation ​ Getting a phone number Visit console.twilio.com and purchase a new phone number (or via the API) Ensure your purchased number supports Voice capabilities Ensure your purchased number appears in your ‘active numbers’ list ​ Project setup You’ll need to set two environment variables for your project: TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN , which you can obtain from the Twilio console. .env Copy Ask AI DAILY_API_KEY = ... DAILY_API_URL = https://api.daily.co/v1 TWILIO_ACCOUNT_SID = ... TWILIO_AUTH_TOKEN = ... OPENAI_API_KEY = ... CARTESIA_API_KEY = ... ​ Configuring your bot runner You’ll need a HTTP service that can receive incoming call hooks and trigger a new agent session. We discussed the concept of a bot runner in the deployment section, which we’ll build on here to add support for incoming phone calls. Here’s how to implement this with FastAPI: server.py Copy Ask AI @app.post ( "/call" , response_class = PlainTextResponse) async def handle_call ( request : Request): """Handle incoming Twilio call webhook.""" print ( "Received call webhook from Twilio" ) try : # Get form data from Twilio webhook form_data = await request.form() data = dict (form_data) # Extract call ID (required to forward the call later) call_sid = data.get( "CallSid" ) if not call_sid: raise HTTPException( status_code = 400 , detail = "Missing CallSid in request" ) print ( f "Processing call with ID: { call_sid } " ) # 1. Create a Daily room with SIP capabilities # More on this in the next section room_details = await create_sip_room(request.app.session) room_url = room_details[ "room_url" ] token = room_details[ "token" ] sip_endpoint = room_details[ "sip_endpoint" ] # 2. Start the bot process with the necessary parameters bot_cmd = f "python bot.py -u { room_url } -t { token } -i { call_sid } -s { sip_endpoint } " subprocess.Popen(shlex.split(bot_cmd)) print ( f "Started bot process with command: { bot_cmd } " ) # 3. IMPORTANT: Put the caller on hold with music while the bot initializes # This is critical to avoid timing issues with Daily SIP initialization resp = VoiceResponse() resp.play( url = "https://your-hold-music.mp3" , # Your custom hold music URL loop = 10 , # Loop the music until the bot is ready ) return str (resp) except Exception as e: print ( f "Unexpected error: { str (e) } " ) raise HTTPException( status_code = 500 , detail = f "Server error: { str (e) } " ) ​ Creating a SIP-enabled room We’ll need to configure the Daily room to be setup to receive SIP connections. In our example project, utils/daily-helpers.py demonstrates how to set up the Daily room using the Daily REST helpers available in Pipecat. We just need to pass through new SIP parameters as part of room creation: utils/daily_helpers.py Copy Ask AI async def create_sip_room ( session : Optional[aiohttp.ClientSession] = None ) -> Dict[ str , str ]: """Create a Daily room with SIP capabilities for phone calls.""" daily_helper = await get_daily_helper(session) # Configure SIP parameters sip_params = DailyRoomSipParams( display_name = "phone-user" , # This can be customized with caller info video = False , # Audio-only call sip_mode = "dial-in" , # For receiving calls num_endpoints = 1 , # Number of SIP endpoints needed ) # Create room properties with SIP enabled properties = DailyRoomProperties( sip = sip_params, enable_dialout = True , # For future expansion if needed start_video_off = True , # Voice only ) # Create and return the room room = await daily_helper.create_room(DailyRoomParams( properties = properties)) token = await daily_helper.get_token(room.url, 24 * 60 * 60 ) # 24 hours validity return { "room_url" : room.url, "token" : token, "sip_endpoint" : room.config.sip_endpoint } ​ Configuring your bot Your bot needs to handle the on_dialin_ready event to forward the call at the right time: bot.py Copy Ask AI async def run_bot ( room_url : str , token : str , call_id : str , sip_uri : str ) -> None : """Run the voice bot with the given parameters.""" logger.info( f "Starting bot with room: { room_url } " ) logger.info( f "SIP endpoint: { sip_uri } " ) # IMPORTANT: Track if call has been forwarded to avoid multiple forwards call_already_forwarded = False # Setup the Daily transport transport = DailyTransport( room_url, token, "Phone Bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , transcription_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) # ... rest of your bot setup code ... # Handle call ready to forward @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , cdata ): nonlocal call_already_forwarded # We only want to forward the call once # The on_dialin_ready event will be triggered for each SIP endpoint if call_already_forwarded: logger.warning( "Call already forwarded, ignoring this event." ) return logger.info( f "Forwarding call { call_id } to { sip_uri } " ) try : # Update the Twilio call with TwiML to forward to the Daily SIP endpoint twilio_client.calls(call_id).update( twiml = f "<Response><Dial timeout= \" 30 \" ><Sip> { sip_uri } </Sip></Dial></Response>" ) logger.info( "Call forwarded successfully" ) call_already_forwarded = True except Exception as e: logger.error( f "Failed to forward call: { str (e) } " ) raise ​ Setting up the Twilio webhook Configure your Twilio phone number to use your server’s webhook URL: Go to the Twilio Console Navigate to Phone Numbers → Manage → Active Numbers Click on your phone number Under “Configure”, set “A Call Comes In” to: Webhook: https://your-server.com/call (your server’s URL) HTTP Method: POST ​ Testing locally For local development, you can use ngrok to expose your local server: Copy Ask AI # Start your server python server.py # In another terminal, start ngrok ngrok http 8000 # Use the ngrok URL (e.g., https://a1b2c3.ngrok.io/call) as your webhook ​ Best Practices and Common Pitfalls ​ ✅ Best Practice: Put the call on hold Always respond to Twilio’s initial webhook with hold music. This gives your bot time to initialize and the Daily SIP endpoint to become ready. Copy Ask AI resp = VoiceResponse() resp.play( url = "https://your-hold-music.mp3" , loop = 10 ) return str (resp) ​ ❌ Pitfall: Using <Pause> instead of hold music Don’t use the TwiML <Pause> element to wait for the bot to initialize: Copy Ask AI # DON'T DO THIS resp = VoiceResponse() resp.pause( length = 10 ) # This can cause connection issues Twilio can only pause a call for a short duration (~5 seconds), which may not be enough time for the Daily SIP setup to complete. ​ ✅ Best Practice: Handle multiple on_dialin_ready events If your Daily room has multiple SIP endpoints, use a flag to ensure you only forward the call once: Copy Ask AI call_already_forwarded = False @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , cdata ): nonlocal call_already_forwarded if call_already_forwarded: logger.info( "Call already forwarded, ignoring this event." ) return # Forward the call... call_already_forwarded = True A single SIP endpoint is sufficient for the initial connection. A second SIP endpoint is needed only if you plan to forward the call. Dial-in: WebRTC (Daily) Dial-in: Twilio (Media Streams) On this page Things you’ll need How It Works Getting a phone number Project setup Configuring your bot runner Creating a SIP-enabled room Configuring your bot Setting up the Twilio webhook Testing locally Best Practices and Common Pitfalls ✅ Best Practice: Put the call on hold ❌ Pitfall: Using <Pause> instead of hold music ✅ Best Practice: Handle multiple on_dialin_ready events Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/telephony_twilio-daily-webrtc_cd323c6f.txt b/telephony_twilio-daily-webrtc_cd323c6f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f4dde0f4302757671ec4867351981f43a3025981
--- /dev/null
+++ b/telephony_twilio-daily-webrtc_cd323c6f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/guides/telephony/twilio-daily-webrtc#%E2%9C%85-best-practice%3A-handle-multiple-on-dialin-ready-events
+Title: Dial-in: WebRTC (Twilio + Daily) - Pipecat
+==================================================
+
+Dial-in: WebRTC (Twilio + Daily) - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Telephony Dial-in: WebRTC (Twilio + Daily) Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Guides Fundamentals Context Management Custom FrameProcessor Detecting Idle Users Ending a Pipeline Function Calling Muting User Input Recording Audio Recording Transcripts Features Gemini Multimodal Live Metrics Noise cancellation with Krisp OpenAI Audio Models and APIs Pipecat Flows Telephony Overview Dial-in: WebRTC (Daily) Dial-in: WebRTC (Twilio + Daily) Dial-in: Twilio (Media Streams) Dialout: WebRTC (Daily) Deploying your bot Overview Deployment pattern Example: Pipecat Cloud Example: Fly.io Example: Cerebrium Example: Modal ​ Things you’ll need An active Twilio developer key One or more Twilio provisioned phone numbers (covered below) The Twilio Python client library ( pip install twilio ) Prefer to look at code? See the example project! We have a complete example project using Daily as a transport with Twilio as a phone provider. This guide walks through the important components and best practices when integrating voice calls. ​ How It Works Here’s the sequence of events when someone calls your Twilio number: Twilio receives an incoming call to your phone number Twilio calls your webhook server ( /call endpoint) Your server creates a Daily room with SIP capabilities Your server starts the bot process with the room details Your server responds to Twilio with TwiML that puts the caller on hold with music Upon receiving the on_dialin_ready event, the bot forwards the call to the Daily SIP endpoint The caller and bot are connected, and the bot handles the conversation ​ Getting a phone number Visit console.twilio.com and purchase a new phone number (or via the API) Ensure your purchased number supports Voice capabilities Ensure your purchased number appears in your ‘active numbers’ list ​ Project setup You’ll need to set two environment variables for your project: TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN , which you can obtain from the Twilio console. .env Copy Ask AI DAILY_API_KEY = ... DAILY_API_URL = https://api.daily.co/v1 TWILIO_ACCOUNT_SID = ... TWILIO_AUTH_TOKEN = ... OPENAI_API_KEY = ... CARTESIA_API_KEY = ... ​ Configuring your bot runner You’ll need a HTTP service that can receive incoming call hooks and trigger a new agent session. We discussed the concept of a bot runner in the deployment section, which we’ll build on here to add support for incoming phone calls. Here’s how to implement this with FastAPI: server.py Copy Ask AI @app.post ( "/call" , response_class = PlainTextResponse) async def handle_call ( request : Request): """Handle incoming Twilio call webhook.""" print ( "Received call webhook from Twilio" ) try : # Get form data from Twilio webhook form_data = await request.form() data = dict (form_data) # Extract call ID (required to forward the call later) call_sid = data.get( "CallSid" ) if not call_sid: raise HTTPException( status_code = 400 , detail = "Missing CallSid in request" ) print ( f "Processing call with ID: { call_sid } " ) # 1. Create a Daily room with SIP capabilities # More on this in the next section room_details = await create_sip_room(request.app.session) room_url = room_details[ "room_url" ] token = room_details[ "token" ] sip_endpoint = room_details[ "sip_endpoint" ] # 2. Start the bot process with the necessary parameters bot_cmd = f "python bot.py -u { room_url } -t { token } -i { call_sid } -s { sip_endpoint } " subprocess.Popen(shlex.split(bot_cmd)) print ( f "Started bot process with command: { bot_cmd } " ) # 3. IMPORTANT: Put the caller on hold with music while the bot initializes # This is critical to avoid timing issues with Daily SIP initialization resp = VoiceResponse() resp.play( url = "https://your-hold-music.mp3" , # Your custom hold music URL loop = 10 , # Loop the music until the bot is ready ) return str (resp) except Exception as e: print ( f "Unexpected error: { str (e) } " ) raise HTTPException( status_code = 500 , detail = f "Server error: { str (e) } " ) ​ Creating a SIP-enabled room We’ll need to configure the Daily room to be setup to receive SIP connections. In our example project, utils/daily-helpers.py demonstrates how to set up the Daily room using the Daily REST helpers available in Pipecat. We just need to pass through new SIP parameters as part of room creation: utils/daily_helpers.py Copy Ask AI async def create_sip_room ( session : Optional[aiohttp.ClientSession] = None ) -> Dict[ str , str ]: """Create a Daily room with SIP capabilities for phone calls.""" daily_helper = await get_daily_helper(session) # Configure SIP parameters sip_params = DailyRoomSipParams( display_name = "phone-user" , # This can be customized with caller info video = False , # Audio-only call sip_mode = "dial-in" , # For receiving calls num_endpoints = 1 , # Number of SIP endpoints needed ) # Create room properties with SIP enabled properties = DailyRoomProperties( sip = sip_params, enable_dialout = True , # For future expansion if needed start_video_off = True , # Voice only ) # Create and return the room room = await daily_helper.create_room(DailyRoomParams( properties = properties)) token = await daily_helper.get_token(room.url, 24 * 60 * 60 ) # 24 hours validity return { "room_url" : room.url, "token" : token, "sip_endpoint" : room.config.sip_endpoint } ​ Configuring your bot Your bot needs to handle the on_dialin_ready event to forward the call at the right time: bot.py Copy Ask AI async def run_bot ( room_url : str , token : str , call_id : str , sip_uri : str ) -> None : """Run the voice bot with the given parameters.""" logger.info( f "Starting bot with room: { room_url } " ) logger.info( f "SIP endpoint: { sip_uri } " ) # IMPORTANT: Track if call has been forwarded to avoid multiple forwards call_already_forwarded = False # Setup the Daily transport transport = DailyTransport( room_url, token, "Phone Bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , transcription_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) # ... rest of your bot setup code ... # Handle call ready to forward @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , cdata ): nonlocal call_already_forwarded # We only want to forward the call once # The on_dialin_ready event will be triggered for each SIP endpoint if call_already_forwarded: logger.warning( "Call already forwarded, ignoring this event." ) return logger.info( f "Forwarding call { call_id } to { sip_uri } " ) try : # Update the Twilio call with TwiML to forward to the Daily SIP endpoint twilio_client.calls(call_id).update( twiml = f "<Response><Dial timeout= \" 30 \" ><Sip> { sip_uri } </Sip></Dial></Response>" ) logger.info( "Call forwarded successfully" ) call_already_forwarded = True except Exception as e: logger.error( f "Failed to forward call: { str (e) } " ) raise ​ Setting up the Twilio webhook Configure your Twilio phone number to use your server’s webhook URL: Go to the Twilio Console Navigate to Phone Numbers → Manage → Active Numbers Click on your phone number Under “Configure”, set “A Call Comes In” to: Webhook: https://your-server.com/call (your server’s URL) HTTP Method: POST ​ Testing locally For local development, you can use ngrok to expose your local server: Copy Ask AI # Start your server python server.py # In another terminal, start ngrok ngrok http 8000 # Use the ngrok URL (e.g., https://a1b2c3.ngrok.io/call) as your webhook ​ Best Practices and Common Pitfalls ​ ✅ Best Practice: Put the call on hold Always respond to Twilio’s initial webhook with hold music. This gives your bot time to initialize and the Daily SIP endpoint to become ready. Copy Ask AI resp = VoiceResponse() resp.play( url = "https://your-hold-music.mp3" , loop = 10 ) return str (resp) ​ ❌ Pitfall: Using <Pause> instead of hold music Don’t use the TwiML <Pause> element to wait for the bot to initialize: Copy Ask AI # DON'T DO THIS resp = VoiceResponse() resp.pause( length = 10 ) # This can cause connection issues Twilio can only pause a call for a short duration (~5 seconds), which may not be enough time for the Daily SIP setup to complete. ​ ✅ Best Practice: Handle multiple on_dialin_ready events If your Daily room has multiple SIP endpoints, use a flag to ensure you only forward the call once: Copy Ask AI call_already_forwarded = False @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , cdata ): nonlocal call_already_forwarded if call_already_forwarded: logger.info( "Call already forwarded, ignoring this event." ) return # Forward the call... call_already_forwarded = True A single SIP endpoint is sufficient for the initial connection. A second SIP endpoint is needed only if you plan to forward the call. Dial-in: WebRTC (Daily) Dial-in: Twilio (Media Streams) On this page Things you’ll need How It Works Getting a phone number Project setup Configuring your bot runner Creating a SIP-enabled room Configuring your bot Setting up the Twilio webhook Testing locally Best Practices and Common Pitfalls ✅ Best Practice: Put the call on hold ❌ Pitfall: Using <Pause> instead of hold music ✅ Best Practice: Handle multiple on_dialin_ready events Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/text_markdown-text-filter_e03d822f.txt b/text_markdown-text-filter_e03d822f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a43e598a9aaef1a3f81ab154a16c5ebe2267ab34
--- /dev/null
+++ b/text_markdown-text-filter_e03d822f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/text/markdown-text-filter#usage-examples
+Title: MarkdownTextFilter - Pipecat
+==================================================
+
+MarkdownTextFilter - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text Aggregators and Filters MarkdownTextFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters MarkdownTextFilter PatternPairAggregator User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview MarkdownTextFilter transforms Markdown-formatted text into plain text that’s suitable for text-to-speech (TTS) systems. It intelligently removes formatting elements while preserving the content structure, including proper spacing and list formatting. This filter is especially valuable for LLM-generated content, which often includes Markdown formatting that would sound unnatural if read aloud by a TTS system. ​ Constructor Copy Ask AI filter = MarkdownTextFilter( params = InputParams()) ​ params InputParams Configuration parameters for the filter ​ Input Parameters Configure the filter behavior with these options: ​ enable_text_filter bool default: "True" Whether the filter is active (when False, text passes through unchanged) ​ filter_code bool default: "False" Whether to remove code blocks from the output ​ filter_tables bool default: "False" Whether to remove Markdown tables from the output ​ Features The filter handles these Markdown elements: Basic Formatting : Removes *italic* , **bold** , and other formatting markers Code : Removes inline code ticks and optionally removes code blocks Lists : Preserves numbered lists while removing Markdown formatting Tables : Optionally removes Markdown tables Whitespace : Carefully preserves meaningful whitespace for natural speech HTML : Removes HTML tags and converts entities to their plain text equivalents ​ Usage Examples ​ Basic Usage with TTS Service Copy Ask AI from pipecat.utils.text.markdown_text_filter import MarkdownTextFilter from pipecat.services.cartesia.tts import CartesiaTTSService # Create the filter md_filter = MarkdownTextFilter() # Use with TTS service tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "voice_id_here" , text_filter = md_filter ) ​ Custom Configuration Copy Ask AI # Create filter that removes code blocks and tables md_filter = MarkdownTextFilter( params = MarkdownTextFilter.InputParams( filter_code = True , filter_tables = True ) ) ​ What Gets Removed Markdown Feature Example Result Bold **important** important Italic *emphasized* emphasized Headers ## Section Section Code (inline) `code` code Code blocks (when enabled) ```python\ncode\n``` Tables (when enabled) |A|B|\n|--|--| HTML tags <em>text</em> text Repeated characters !!!!!!! ! ​ Notes Preserves sentence structure and readability Maintains whitespace that affects speech prosody Handles streaming text with partial Markdown elements Efficiently converts HTML entities to plain text characters Smart handling of code blocks and tables with state tracking Integrates directly with TTS services in the Pipecat pipeline DTMFAggregator PatternPairAggregator On this page Overview Constructor Input Parameters Features Usage Examples Basic Usage with TTS Service Custom Configuration What Gets Removed Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/text_pattern-pair-aggregator_e1dabf52.txt b/text_pattern-pair-aggregator_e1dabf52.txt
new file mode 100644
index 0000000000000000000000000000000000000000..75c7600efaf15b87f70fd469f9a9153276e7ecc8
--- /dev/null
+++ b/text_pattern-pair-aggregator_e1dabf52.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/text/pattern-pair-aggregator#pattern-match-object
+Title: PatternPairAggregator - Pipecat
+==================================================
+
+PatternPairAggregator - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text Aggregators and Filters PatternPairAggregator Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters MarkdownTextFilter PatternPairAggregator User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview PatternPairAggregator is a specialized text aggregator that buffers streaming text until it can identify complete pattern pairs (like XML tags, markdown formatting, or custom delimiters). It processes the content between these patterns using registered handlers and returns text at sentence boundaries (therefore allowing normal TTS processing to occur). This aggregator is particularly useful for applications like voice switching, structured content processing, and extracting metadata from LLM outputs, ensuring that patterns spanning multiple text chunks are correctly identified. Want to see it in action? Check out the voice switching demo . ​ Constructor Copy Ask AI aggregator = PatternPairAggregator() No parameters are required for initialization. The aggregator starts with an empty buffer and no registered patterns. ​ Methods ​ add_pattern_pair Copy Ask AI aggregator.add_pattern_pair(pattern_id, start_pattern, end_pattern, remove_match = True ) Registers a new pattern pair to detect in the text. ​ pattern_id str required Unique identifier for this pattern pair ​ start_pattern str required Pattern that marks the beginning of content ​ end_pattern str required Pattern that marks the end of content ​ remove_match bool default: "True" Whether to remove the matched patterns from the output text ​ Returns Self for method chaining ​ on_pattern_match Copy Ask AI aggregator.on_pattern_match(pattern_id, handler) Registers a handler function to be called when a specific pattern pair is matched. ​ pattern_id str required ID of the pattern pair to match ​ handler Callable[[PatternMatch], None] required Function to call when the pattern is matched. The function should accept a PatternMatch object. ​ Returns Self for method chaining ​ Pattern Match Object When a pattern is matched, the handler function receives a PatternMatch object with these attributes: ​ pattern_id str The identifier of the matched pattern pair ​ full_match str The complete text including start and end patterns ​ content str The text content between the start and end patterns ​ Usage Examples ​ Voice Switching in TTS Copy Ask AI # Define voice IDs VOICE_IDS = { "narrator" : "c45bc5ec-dc68-4feb-8829-6e6b2748095d" , "female" : "71a7ad14-091c-4e8e-a314-022ece01c121" , "male" : "7cf0e2b1-8daf-4fe4-89ad-f6039398f359" , } # Create pattern aggregator pattern_aggregator = PatternPairAggregator() # Add pattern for voice tags pattern_aggregator.add_pattern_pair( pattern_id = "voice_tag" , start_pattern = "<voice>" , end_pattern = "</voice>" , remove_match = True ) # Register handler for voice switching def on_voice_tag ( match : PatternMatch): voice_name = match.content.strip().lower() if voice_name in VOICE_IDS : voice_id = VOICE_IDS [voice_name] tts.set_voice(voice_id) logger.info( f "Switched to { voice_name } voice" ) pattern_aggregator.on_pattern_match( "voice_tag" , on_voice_tag) # Set the aggregator on a TTS service tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = VOICE_IDS [ "narrator" ], text_aggregator = pattern_aggregator ) ​ Extracting Structured Data from LLM Outputs Copy Ask AI # Create pattern aggregator data_extractor = PatternPairAggregator() # Add pattern for JSON data data_extractor.add_pattern_pair( pattern_id = "json_data" , start_pattern = "```json" , end_pattern = "```" , remove_match = True ) # Track extracted data extracted_data = {} # Register handler for JSON data def on_json_data ( match : PatternMatch): try : data = json.loads(match.content) extracted_data.update(data) except json.JSONDecodeError as e: logger.error( f "Failed to parse JSON: { e } " ) data_extractor.on_pattern_match( "json_data" , on_json_data) ​ Concept Explanation with Multiple Patterns Copy Ask AI # Create pattern aggregator concept_aggregator = PatternPairAggregator() # Add patterns for different parts of an explanation concept_aggregator.add_pattern_pair( pattern_id = "definition" , start_pattern = "<definition>" , end_pattern = "</definition>" , remove_match = False # Keep the tags in the output ) concept_aggregator.add_pattern_pair( pattern_id = "example" , start_pattern = "<example>" , end_pattern = "</example>" , remove_match = False ) # Register handlers def on_definition ( match : PatternMatch): logger.info( f "Found definition: { match.content } " ) # Could format differently, store for later, etc. def on_example ( match : PatternMatch): logger.info( f "Found example: { match.content } " ) # Could create a visual representation, etc. concept_aggregator.on_pattern_match( "definition" , on_definition) concept_aggregator.on_pattern_match( "example" , on_example) ​ How It Works ​ Notes Patterns are processed in the order they appear in the text Handlers are called when complete patterns are found Patterns can span multiple sentences of text, but be aware that encoding many “reasoning” tokens may slow down the LLM response MarkdownTextFilter TranscriptProcessor On this page Overview Constructor Methods add_pattern_pair on_pattern_match Pattern Match Object Usage Examples Voice Switching in TTS Extracting Structured Data from LLM Outputs Concept Explanation with Multiple Patterns How It Works Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_311eb2c7.txt b/transport_daily_311eb2c7.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a26eaf941ab631810bef7face6ddce5db4859d0f
--- /dev/null
+++ b/transport_daily_311eb2c7.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#input-frames
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_3f37fa05.txt b/transport_daily_3f37fa05.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a02ccf7cfff85b1fe3bc9bc513dad1415b7c8c6a
--- /dev/null
+++ b/transport_daily_3f37fa05.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#overview
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_444c407f.txt b/transport_daily_444c407f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e90f3291abfc353b1e9860647550816248c336ad
--- /dev/null
+++ b/transport_daily_444c407f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#param-capture-participant-video
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_74c938ca.txt b/transport_daily_74c938ca.txt
new file mode 100644
index 0000000000000000000000000000000000000000..78ec9e9f413e901cff8dbd80d1a6f027415b56f3
--- /dev/null
+++ b/transport_daily_74c938ca.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#param-video-out-framerate
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_827d0e8d.txt b/transport_daily_827d0e8d.txt
new file mode 100644
index 0000000000000000000000000000000000000000..413af29747ab35fdb029c7eb6028df80555b8dab
--- /dev/null
+++ b/transport_daily_827d0e8d.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#param-output-audio-raw-frame
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_a5e58d09.txt b/transport_daily_a5e58d09.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ee27fc5828f72060476469a64c274b63e7727389
--- /dev/null
+++ b/transport_daily_a5e58d09.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#param-on-dialin-error
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_b382ccb2.txt b/transport_daily_b382ccb2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b276f56463b6daaf753739b0e654751c4013f9d6
--- /dev/null
+++ b/transport_daily_b382ccb2.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#param-on-left
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_b41b69b4.txt b/transport_daily_b41b69b4.txt
new file mode 100644
index 0000000000000000000000000000000000000000..66242abd8ee4234f048ebac1a10fdac602ac45e0
--- /dev/null
+++ b/transport_daily_b41b69b4.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#param-on-joined
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_daily_ebefaed3.txt b/transport_daily_ebefaed3.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0121a61f9a9b6b2c2edb9d2ec3ad2cccacaadc92
--- /dev/null
+++ b/transport_daily_ebefaed3.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/daily#param-start-recording
+Title: Daily WebRTC - Pipecat
+==================================================
+
+Daily WebRTC - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport Daily WebRTC Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DailyTransport provides real-time audio and video communication capabilities using Daily’s WebRTC platform. It supports bidirectional audio/video streams, transcription, voice activity detection (VAD), and participant management. ​ Installation To use DailyTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[daily]" You’ll also need to set up your Daily API key as an environment variable: DAILY_API_KEY . You can obtain a Daily API key by signing up at Daily . ​ Configuration ​ Constructor Parameters ​ room_url str required Daily room URL ​ token str | None Daily room token ​ bot_name str required Name of the bot in the room ​ params DailyParams default: "DailyParams()" Transport configuration parameters ​ DailyParams Configuration ​ api_url str default: "https://api.daily.co/v1" Daily API endpoint URL ​ api_key str default: "" Daily API key for authentication ​ Audio Output Configuration ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_is_live bool default: "False" Enable live audio streaming mode ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ audio_out_bitrate int default: "96000" Audio output bitrate in bits per second ​ Audio Input Configuration ​ audio_in_enabled bool default: "False" Enable audio input capabilities ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_in_sample_rate int default: "None" Audio input sample rate in Hz ​ audio_in_channels int default: "1" Number of audio input channels ​ audio_in_filter Optional[BaseAudioFilter] default: "None" Audio filter for input processing. Supported filters are: KrispFilter() and NoisereduceFilter() . See the KrispFilter() reference docs and NoisereduceFilter() reference docs ` for more information. ​ Video Output Configuration ​ video_out_enabled bool default: "False" Enable video output capabilities ​ video_out_is_live bool default: "False" Enable live video streaming mode ​ video_out_width int default: "1024" Video output width in pixels ​ video_out_height int default: "768" Video output height in pixels ​ video_out_bitrate int default: "800000" Video output bitrate in bits per second ​ video_out_framerate int default: "30" Video output frame rate ​ video_out_color_format str default: "RGB" Video color format (RGB, BGR, etc.) ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Feature Settings ​ dialin_settings Optional[DailyDialinSettings] default: "None" Configuration for dial-in functionality ​ transcription_enabled bool default: "False" Enable real-time transcription ​ transcription_settings DailyTranscriptionSettings Configuration for transcription features Copy Ask AI class DailyTranscriptionSettings ( BaseModel ): language: str = "en" # Default language model: str = "nova-2-general" # Transcription model profanity_filter: bool = True # Filter profanity redact: bool = False # Redact sensitive information endpointing: bool = True # Enable speech endpointing punctuate: bool = True # Add punctuation includeRawResponse: bool = True # Include raw API response extra: Mapping[ str , Any] = { "interim_results" : True # Provide any Deepgram-specific settings } ​ Basic Usage Copy Ask AI from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.audio.vad.silero import SileroVADAnalyzer # Configure transport transport = DailyTransport( room_url = "https://your-domain.daily.co/room-name" , token = "your-room-token" , bot_name = "AI Assistant" , params = DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio/video stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio/video ]) # Use transport event handlers to manage the call lifecycle # Event handler for when the user joins the room: @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant ): # Start transcription for the user await transport.capture_participant_transcription(participant[ "id" ]) # Kick off the conversation. await task.queue_frames([context_aggregator.user().get_context_frame()]) # Event handler for when the user leaves the room: @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant , reason ): # Cancel the pipeline, which stops processing and removes the bot from the room await task.cancel() ​ Event Callbacks DailyTransport provides a comprehensive callback system for handling various events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_joined async callback Called when the bot successfully joins the room. Parameters: transport : The DailyTransport instance data : Dictionary containing room join information Copy Ask AI @transport.event_handler ( "on_joined" ) async def on_joined ( transport , data : Dict[ str , Any]): logger.info( f "Joined room with data: { data } " ) ​ on_left async callback Called when the bot leaves the room. Parameters: transport : The DailyTransport instance Copy Ask AI @transport.event_handler ( "on_left" ) async def on_left ( transport ): logger.info( "Left room" ) ​ Participant Events ​ on_first_participant_joined async callback Called when the first participant joins the room. Useful for initializing conversations. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_first_participant_joined" ) async def on_first_participant_joined ( transport , participant : Dict[ str , Any]): await transport.capture_participant_transcription(participant[ "id" ]) await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_participant_joined async callback Called when any participant joins the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information Copy Ask AI @transport.event_handler ( "on_participant_joined" ) async def on_participant_joined ( transport , participant : Dict[ str , Any]): logger.info( f "Participant joined: { participant[ 'id' ] } " ) ​ on_participant_left async callback Called when a participant leaves the room. Parameters: transport : The DailyTransport instance participant : Dictionary containing participant information reason : String describing why the participant left, “leftCall” | “hidden” Copy Ask AI @transport.event_handler ( "on_participant_left" ) async def on_participant_left ( transport , participant : Dict[ str , Any], reason : str ): logger.info( f "Participant { participant[ 'id' ] } left: { reason } " ) # Cancel the pipeline task to stop processing and remove the bot from the Daily room await task.cancel() ​ on_participant_updated async callback Event emitted when a participant is updated. This can mean either the participant’s metadata was updated, or the tracks belonging to the participant changed. Parameters: transport : The DailyTransport instance participant : Dictionary containing updated participant information Copy Ask AI @transport.event_handler ( "on_participant_updated" ) async def on_participant_updated ( transport , participant : Dict[ str , Any]): logger.info( f "Participant updated: { participant } " ) ​ Communication Events ​ on_app_message async callback Event emitted when a custom app message is received from another participant or via the REST API. Parameters: transport : The DailyTransport instance message : The message content (any type) sender : String identifier of the message sender Copy Ask AI @transport.event_handler ( "on_app_message" ) async def on_app_message ( transport , message : Any, sender : str ): logger.info( f "Message from { sender } : { message } " ) ​ on_call_state_updated async callback Event emitted when the call state changes, normally as a consequence of joining or leaving the call. Parameters: transport : The DailyTransport instance state : String representing the new call state. Learn more about call states . Copy Ask AI @transport.event_handler ( "on_call_state_updated" ) async def on_call_state_updated ( transport , state : str ): logger.info( f "Call state updated: { state } " ) ​ Dial Events ​ Dial-in ​ on_dialin_ready async callback Event emitted when dial-in is ready. This happens after the room has connected to the SIP endpoint and the system is ready to receive dial-in calls. Parameters: transport : The DailyTransport instance sip_endpoint : String containing the SIP endpoint information Copy Ask AI @transport.event_handler ( "on_dialin_ready" ) async def on_dialin_ready ( transport , sip_endpoint : str ): logger.info( f "Dial-in ready at: { sip_endpoint } " ) ​ on_dialin_connected async callback Event emitted when the session with the dial-in remote end is established (i.e. SIP endpoint or PSTN are connectd to the Daily room). Note: connected does not mean media (audio or video) has started flowing between the room and PSTN, it means the room received the connection request and both endpoints are negotiating the media flow. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinConnectedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_connected" ) async def on_dialin_connected ( transport , data : Any): logger.info( f "Dial-in connected: { data } " ) ​ on_dialin_stopped async callback Event emitted when the dial-in remote end disconnects the call. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. See DialinStoppedEvent . Copy Ask AI @transport.event_handler ( "on_dialin_stopped" ) async def on_dialin_stopped ( transport , data : Any): logger.info( f "Dial-in stopped: { data } " ) ​ on_dialin_error async callback Event emitted in the case of dial-in errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialin-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_error" ) async def on_dialin_error ( transport , data : Any): logger.error( f "Dial-in error: { data } " ) ​ on_dialin_warning async callback Event emitted there is a dial-in non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. See DialinEvent . Copy Ask AI @transport.event_handler ( "on_dialin_warning" ) async def on_dialin_warning ( transport , data : Any): logger.warning( f "Dial-in warning: { data } " ) ​ Dial-out ​ on_dialout_answered async callback Event emitted when the session with the dial-out remote end is answered. Parameters: transport : The DailyTransport instance data : Dictionary containing call information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_answered" ) async def on_dialout_answered ( transport , data : Any): logger.info( f "Dial-out answered: { data } " ) ​ on_dialout_connected async callback Event emitted when the session with the dial-out remote end is established. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_connected" ) async def on_dialout_connected ( transport , data : Any): logger.info( f "Dial-out connected: { data } " ) ​ on_dialout_stopped async callback Event emitted when the dial-out session is stopped. Parameters: transport : The DailyTransport instance data : Dictionary containing connection information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_stopped" ) async def on_dialout_stopped ( transport , data : Any): logger.info( f "Dial-out stopped: { data } " ) ​ on_dialout_error async callback Event emitted in the case of dial-out errors which are fatal and the service cannot proceed. For example, an error in SDP negotiation is fatal to the media/SIP pipeline and will result in dialout-error being triggered. Parameters: transport : The DailyTransport instance data : Dictionary containing error information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_error" ) async def on_dialout_error ( transport , data : Any): logger.error( f "Dial-out error: { data } " ) ​ on_dialout_warning async callback Event emitted there is a dial-out non-fatal error, such as the selected codec not being used and a fallback codec being utilized. Parameters: transport : The DailyTransport instance data : Dictionary containing warning information. Learn more . Copy Ask AI @transport.event_handler ( "on_dialout_warning" ) async def on_dialout_warning ( transport , data : Any): logger.warning( f "Dial-out warning: { data } " ) ​ Transcription Events ​ on_transcription_message async callback Called when a transcription message is received. This includes both interim and final transcriptions. Parameters: transport : The DailyTransport instance message : Dictionary containing transcription data including. Learn more . Copy Ask AI @transport.event_handler ( "on_transcription_message" ) async def on_transcription_message ( transport , message : Dict[ str , Any]): participant_id = message.get( "participantId" ) text = message.get( "text" ) is_final = message[ "rawResponse" ][ "is_final" ] logger.info( f "Transcription from { participant_id } : { text } (final: { is_final } )" ) ​ Recording Events ​ on_recording_started async callback Called when a room recording starts successfully. Parameters: transport : The DailyTransport instance status : Dictionary containing recording status information. Learn more . Copy Ask AI @transport.event_handler ( "on_recording_started" ) async def on_recording_started ( transport , status : Dict[ str , Any]): logger.info( f "Recording started with status: { status } " ) ​ on_recording_stopped async callback Called when a room recording stops. Parameters: transport : The DailyTransport instance stream_id : String identifier of the stopped recording stream Copy Ask AI @transport.event_handler ( "on_recording_stopped" ) async def on_recording_stopped ( transport , stream_id : str ): logger.info( f "Recording stopped for stream: { stream_id } " ) ​ on_recording_error async callback Called when an error occurs during recording. Parameters: transport : The DailyTransport instance stream_id : String identifier of the recording stream message : Error message describing what went wrong Copy Ask AI @transport.event_handler ( "on_recording_error" ) async def on_recording_error ( transport , stream_id : str , message : str ): logger.error( f "Recording error for stream { stream_id } : { message } " ) ​ Error Events ​ on_error async callback Called when a transport error occurs. Parameters: transport : The DailyTransport instance error : String containing error details Copy Ask AI @transport.event_handler ( "on_error" ) async def on_error ( transport , error : str ): logger.error( f "Transport error: { error } " ) ​ Notes on Callbacks All callbacks are asynchronous Callbacks are executed in the order they were registered Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Callbacks should be lightweight to avoid blocking the event loop Heavy processing should be offloaded to separate tasks ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from participants ​ UserImageRawFrame Frame Video frames from participants ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent ​ OutputImageRawFrame Frame Video frames to be sent ​ Methods ​ Room Management ​ participants method Returns a dictionary of all participants currently in the room. Copy Ask AI def participants () -> Dict[ str , Any] See participants for more details. ​ participant_counts method Returns participant count statistics for the room. Copy Ask AI def participant_counts () -> Dict[ str , int ] See participant_counts for more details. ​ send_message async method Sends a message to participants in the Daily room. Messages can be directed to all participants or targeted to a specific participant. Copy Ask AI async def send_message ( frame : TransportMessageFrame | TransportMessageUrgentFrame) Parameters: frame : A transport message frame containing the message to send. Can be either: TransportMessageFrame : Standard message TransportMessageUrgentFrame : High-priority message Message Structure: Messages should conform to the expected format for the receiving system. For RTVI messages, use: Copy Ask AI { "label" : "rtvi-ai" , # Required for RTVI protocol "type" : "message-type" , # Defines how the client will process it "id" : "unique-id" , # Unique identifier for the message "data" : { # Custom payload # Your custom data fields } } Example - Standard Message: Copy Ask AI import uuid # Send a standard message to all participants message_data = { "label" : "rtvi-ai" , "type" : "custom-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Hello everyone!" , "timestamp" : datetime.datetime.now().isoformat() } } await transport.send_message(TransportMessageFrame( message = message_data)) Example - Urgent Message to Specific Participant: Copy Ask AI # Send urgent message to specific participant urgent_data = { "label" : "rtvi-ai" , "type" : "server-message" , "id" : str (uuid.uuid4()), "data" : { "message" : "Private message" , "importance" : "high" } } await transport.send_message( DailyTransportMessageUrgentFrame( message = urgent_data, participant_id = "participant-123" ) ) The message will be delivered to: All participants if no specific participant_id is set Only the specified participant if participant_id is set Through the Daily app messaging system Can be received by registering an on_app_message handler Use TransportMessageUrgentFrame for high-priority messages that need immediate delivery. ​ Media Control ​ send_image async method Sends an image frame to the room. Copy Ask AI async def send_image ( frame : OutputImageRawFrame | SpriteFrame) -> None Parameters: frame : Image frame to send, either raw image data or sprite animation ​ send_audio async method Sends an audio frame to the room. Copy Ask AI async def send_audio ( frame : OutputAudioRawFrame) -> None Parameters: frame : Audio frame to send ​ Video Management ​ capture_participant_video async method Starts capturing video from a specific participant. Copy Ask AI async def capture_participant_video ( participant_id : str , framerate : int = 30 , video_source : str = "camera" , color_format : str = "RGB" ) -> None Parameters: participant_id : ID of the participant to capture framerate : Target frame rate (default: 30) video_source : Video source type (default: “camera”) color_format : Color format of the video (default: “RGB”) To request an image from the camera stream, set the framerate to 0 and push a UserImageRequestFrame . ​ Transcription Control ​ capture_participant_transcription async method Starts capturing and transcribing audio from a specific participant. Copy Ask AI async def capture_participant_transcription ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to transcribe ​ update_transcription async method Updates the transcription configuration for specific participants or instances. Copy Ask AI async def update_transcription ( participants : List[ str ] | None = None , instance_id : str | None = None ) -> None Parameters: participants : Optional list of participant IDs to transcribe. If None, transcribes all participants. instance_id : Optional specific transcription instance ID to update Example: Copy Ask AI # Update transcription for specific participants await transport.update_transcription( participants = [ "participant-123" , "participant-456" ]) # Update specific transcription instance await transport.update_transcription( instance_id = "transcription-789" ) ​ Recording Control ​ start_recording async method Starts recording the room session. Copy Ask AI async def start_recording ( streaming_settings : Dict = None , stream_id : str = None , force_new : bool = None ) -> None Parameters: streaming_settings : Recording configuration settings stream_id : Optional stream identifier force_new : Force start a new recording See start_recording for more details. ​ stop_recording async method Stops an active recording. Copy Ask AI async def stop_recording ( stream_id : str = None ) -> None Parameters: stream_id : Optional stream identifier to stop specific recording See stop_recording for more details. ​ Dial-out Control ​ start_dialout async method Initiates a dial-out call. Copy Ask AI async def start_dialout ( settings : Dict = None ) -> None Parameters: settings : Dial-out configuration settings See start_dialout for more details. ​ stop_dialout async method Stops an active dial-out call. Copy Ask AI async def stop_dialout ( participant_id : str ) -> None Parameters: participant_id : ID of the participant to disconnect See stop_dialout for more details. ​ send_dtmf async method Sends DTMF tones in an existing dial-out session. Copy Ask AI async def send_dtmf ( settings : Dict = None ) -> None Parameters: settings : DTMF settings See send_dtmf for more details. ​ Subscription Management ​ update_subscriptions async method Updates subscriptions and subscription profiles. This function allows you to update subscription profiles and at the same time assign specific subscription profiles to a participant and even change specific settings for some participants. Copy Ask AI async def update_subscriptions ( participant_settings : Dict = None , profile_settings : Dict = None ) -> None Parameters: participant_settings : Per-participant subscription settings profile_settings : Global subscription profile settings See update_subscriptions for more details. ​ Participant Admin When using the DailyTransport, if your bot is given a meeting token with owner or participant admin permissions, it can remote-control other participants’ capabilities and permissions. With this control, the bot can modify settings for how others send and receive media, how they are seen in a session, and whether they have administrative controls. ​ update_remote_participants async method Updates permissions and settings for specific participants in the room. Copy Ask AI async def update_remote_participants ( remote_participants : Mapping[ str , Any] = None ) -> None Parameters: remote_participants : Dictionary mapping participant IDs to their updated settings This method allows fine-grained control over participant capabilities and permissions in the Daily room. Examples: Copy Ask AI # Mute a participant's microphone await transport.update_remote_participants({ "participant-123" : { "inputsEnabled" : { "microphone" : False } } }) # Revoke send permissions (mutes the participant and prevents them from unmuting) await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canSend" : [] # Empty list revokes all send permissions } } }) # Configure selective audio routing await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "canReceive" : { "base" : False , # Don't receive from anyone by default "byUserId" : { "hold-music" : True # Only receive from hold music player } } } } }) # Set up user-to-user permissions mapping (e.g., when transferring calls) # Assumes customer was previously unable to hear anyone (they were on hold) and agent was speaking to bot await transport.update_remote_participants({ "participant-123" : { # the customer "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "agent" : True }} } }, "participant-456" : { # the agent "permissions" : { "base" : False , "canReceive" : { "byUserId" : { "customer" : True }} } } }) # Promote a participant to be a participant admin await transport.update_remote_participants({ "participant-123" : { "permissions" : { "canAdmin" : [ "participants" ] } } }) Permissions Settings: For comprehensive details on permissions settings, see the Daily API documentation: daily-python’s update_remote_participants() documentation daily-js’s updateParticipant() -> updatePermission and participants().permissions ​ Chat messages ​ send_prebuilt_chat_message async method When testing with Daily Prebuilt, you can send messages to the Prebuilt chat using this method. Parameters: message (str): The chat message to send user_name (str): The name of the user sending the message See send_prebuilt_chat_message for more details. ​ Properties ​ participant_id string Returns the transport’s participant ID in the room. ​ room_url string Returns the transport’s room URL for the room currently in use. ​ Notes Supports real-time audio/video communication Handles WebRTC connection management Provides voice activity detection Includes transcription capabilities Manages participant interactions Supports recording and streaming Thread-safe processing All callbacks are asynchronous Multiple handlers can be registered for the same event Exceptions in callbacks are logged but don’t stop the transport Supported Services SmallWebRTCTransport On this page Overview Installation Configuration Constructor Parameters DailyParams Configuration Audio Output Configuration Audio Input Configuration Video Output Configuration Voice Activity Detection (VAD) Feature Settings Basic Usage Event Callbacks Connection Events Participant Events Communication Events Dial Events Dial-in Dial-out Transcription Events Recording Events Error Events Notes on Callbacks Frame Types Input Frames Output Frames Methods Room Management Media Control Video Management Transcription Control Recording Control Dial-out Control Subscription Management Participant Admin Chat messages Properties Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_fastapi-websocket_70e4f59a.txt b/transport_fastapi-websocket_70e4f59a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..61211d6a210028117330e41f6bd6bba9027ba2e7
--- /dev/null
+++ b/transport_fastapi-websocket_70e4f59a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/fastapi-websocket#fastapiwebsocketparams-configuration
+Title: FastAPI WebSocket - Pipecat
+==================================================
+
+FastAPI WebSocket - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport FastAPI WebSocket Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview FastAPIWebsocketTransport provides WebSocket support for FastAPI web applications, enabling real-time audio communication. It supports bidirectional audio streams and voice activity detection (VAD). FastAPIWebsocketTransport is best suited for server-side applications and prototyping client/server apps. For client/server production applications, we strongly recommend using a WebRTC-based transport for robust network and media handling. ​ Installation To use FastAPIWebsocketTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[websocket]" ​ Configuration ​ Constructor Parameters ​ websocket WebSocket required FastAPI WebSocket connection instance ​ params FastAPIWebsocketParams required Transport configuration parameters ​ FastAPIWebsocketParams Configuration ​ add_wav_header bool default: "False" Add WAV header to audio frames ​ serializer FrameSerializer required Frame serializer for WebSocket messages. Common options include: ExotelFrameSerializer - For Exotel Websocket streaming integration PlivoFrameSerializer - For Plivo Websocket streaming integration TelnyxFrameSerializer - For Telnyx WebSocket streaming integration TwilioFrameSerializer - For Twilio Media Streams integration See the Frame Serializers documentation for more details. ​ session_timeout int | None default: "None" Session timeout in seconds. If set, triggers timeout event when no activity is detected ​ Audio Configuration ​ audio_in_enabled bool default: false Enable audio input from the WebRTC client ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Basic Usage Copy Ask AI from fastapi import FastAPI, WebSocket from pipecat.transports.network.fastapi_websocket import ( FastAPIWebsocketTransport, FastAPIWebsocketParams ) from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline from pipecat.serializers.twilio import TwilioFrameSerializer app = FastAPI() @app.websocket ( "/ws" ) async def websocket_endpoint ( websocket : WebSocket): await websocket.accept() # Configure transport transport = FastAPIWebsocketTransport( websocket = websocket, params = FastAPIWebsocketParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), serializer = TwilioFrameSerializer(), ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio ]) # Run pipeline task = PipelineTask(pipeline) await PipelineRunner().run(task) Check out the Twilio Chatbot example to see how to use the FastAPI transport in a phone application. ​ Event Callbacks FastAPIWebsocketTransport provides callbacks for handling client connection events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_client_connected async callback Called when a client connects to the WebSocket endpoint. Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( "Client connected" ) # Initialize conversation await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_client_disconnected async callback Called when a client disconnects from the WebSocket endpoint. Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( "Client disconnected" ) await task.queue_frames([EndFrame()]) ​ on_session_timeout async callback Called when a session times out (if session_timeout is configured). Parameters: transport : The FastAPIWebsocketTransport instance client : FastAPI WebSocket connection object Copy Ask AI @transport.event_handler ( "on_session_timeout" ) async def on_session_timeout ( transport , client ): logger.info( "Session timeout" ) # Handle timeout (e.g., send message, close connection) ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from the WebSocket client ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent to the WebSocket client ​ Notes Integrates with FastAPI web applications Supports real-time audio communication Handles WebSocket connection management Provides voice activity detection Supports session timeouts All callbacks are asynchronous Compatible with various frame serializers SmallWebRTCTransport WebSocket Server On this page Overview Installation Configuration Constructor Parameters FastAPIWebsocketParams Configuration Audio Configuration Voice Activity Detection (VAD) Basic Usage Event Callbacks Connection Events Frame Types Input Frames Output Frames Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_small-webrtc_444ed115.txt b/transport_small-webrtc_444ed115.txt
new file mode 100644
index 0000000000000000000000000000000000000000..308d10b7634113ae3d9e6a058ada7efd993a1a75
--- /dev/null
+++ b/transport_small-webrtc_444ed115.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/small-webrtc#smallwebrtctransport
+Title: SmallWebRTCTransport - Pipecat
+==================================================
+
+SmallWebRTCTransport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport SmallWebRTCTransport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SmallWebRTCTransport enables peer-to-peer WebRTC connections between clients and your Pipecat application. It implements bidirectional audio and video streaming using WebRTC for real-time communication. This transport is intended for lightweight implementations, particularly for local development and testing. It expects your clients to include a corresponding SmallWebRTCTransport implementation. See here for the JavaScript implementation. SmallWebRTCTransport is best used for testing and development. For production deployments with scale, consider using the DailyTransport , as it has global, low-latency infrastructure. ​ Installation To use SmallWebRTCTransport , install the required dependencies: Copy Ask AI pip install pipecat-ai[webrtc] ​ Class Reference ​ SmallWebRTCConnection SmallWebRTCConnection manages the WebRTC connection details, peer connection state, and ICE candidates. It handles the signaling process and media tracks. Copy Ask AI SmallWebRTCConnection( ice_servers = None ) ​ ice_servers Union[List[str], List[IceServer]] List of STUN/TURN server URLs for ICE connection establishment. Can be provided as strings or as IceServer objects. ​ Methods ​ initialize async method Initialize the connection with a client’s SDP offer. Parameters: sdp : String containing the Session Description Protocol data from client’s offer type : String representing the SDP message type (typically “offer”) Copy Ask AI await webrtc_connection.initialize( sdp = client_sdp, type = "offer" ) ​ connect async method Establish the WebRTC peer connection after initialization. Copy Ask AI await webrtc_connection.connect() ​ close async method Close the WebRTC peer connection. Copy Ask AI await webrtc_connection.close() ​ disconnect async method Disconnect the WebRTC peer connection and send a peer left message to the client. Copy Ask AI await webrtc_connection.disconnect() ​ renegotiate async method Handle connection renegotiation requests. Parameters: sdp : String containing the Session Description Protocol data for renegotiation type : String representing the SDP message type restart_pc : Boolean indicating whether to completely restart the peer connection (default: False) Copy Ask AI await webrtc_connection.renegotiate( sdp = new_sdp, type = "offer" , restart_pc = False ) ​ get_answer method Retrieve the SDP answer to send back to the client. Returns a dictionary with sdp , type , and pc_id fields. Copy Ask AI answer = webrtc_connection.get_answer() # Returns: {"sdp": "...", "type": "answer", "pc_id": "..."} ​ send_app_message method Send an application message to the client. Parameters: message : The message to send (will be JSON serialized) Copy Ask AI webrtc_connection.send_app_message({ "action" : "greeting" , "text" : "Hello!" }) ​ is_connected method Check if the connection is active. Copy Ask AI if webrtc_connection.is_connected(): print ( "Connection is active" ) ​ audio_input_track method Get the audio input track from the client. Copy Ask AI audio_track = webrtc_connection.audio_input_track() ​ video_input_track method Get the video input track from the client. Copy Ask AI video_track = webrtc_connection.video_input_track() ​ replace_audio_track method Replace the current audio track with a new one. Parameters: track : The new audio track to use Copy Ask AI webrtc_connection.replace_audio_track(new_audio_track) ​ replace_video_track method Replace the current video track with a new one. Parameters: track : The new video track to use Copy Ask AI webrtc_connection.replace_video_track(new_video_track) ​ ask_to_renegotiate method Request the client to initiate connection renegotiation. Copy Ask AI webrtc_connection.ask_to_renegotiate() ​ event_handler decorator Register an event handler for connection events. Events: "app-message" : Called when a message is received from the client "track-started" : Called when a new track is started "track-ended" : Called when a track ends "connecting" : Called when connection is being established "connected" : Called when connection is established "disconnected" : Called when connection is lost "closed" : Called when connection is closed "failed" : Called when connection fails "new" : Called when a new connection is created Copy Ask AI @webrtc_connection.event_handler ( "connected" ) async def on_connected ( connection ): print ( f "WebRTC connection established" ) ​ SmallWebRTCTransport SmallWebRTCTransport is the main transport class that manages both input and output transports for WebRTC communication. Copy Ask AI SmallWebRTCTransport( webrtc_connection: SmallWebRTCConnection, params: TransportParams, input_name: Optional[ str ] = None , output_name: Optional[ str ] = None ) ​ webrtc_connection SmallWebRTCConnection required An instance of SmallWebRTCConnection that manages the WebRTC connection ​ params TransportParams required Configuration parameters for the transport Show TransportParams properties ​ audio_in_enabled bool default: false Enable audio input from the WebRTC client ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_out_enabled bool default: false Enable audio output to the WebRTC client ​ audio_in_sample_rate int Sample rate for incoming audio (Hz) ​ audio_out_sample_rate int Sample rate for outgoing audio (Hz) ​ audio_in_channels int default: 1 Number of audio input channels (1 for mono, 2 for stereo) ​ audio_out_channels int default: 1 Number of audio output channels (1 for mono, 2 for stereo) ​ video_in_enabled bool default: false Enable video input from the WebRTC client ​ video_out_enabled bool default: false Enable video output to the WebRTC client ​ video_out_width int default: 640 Width of outgoing video ​ video_out_height int default: 480 Height of outgoing video ​ video_out_framerate int default: 30 Framerate of outgoing video ​ vad_analyzer VADAnalyzer Custom VAD analyzer implementation ​ input_name str Optional name for the input transport ​ output_name str Optional name for the output transport ​ Methods ​ input method Returns the input transport instance. Copy Ask AI input_transport = webrtc_transport.input() ​ output method Returns the output transport instance. Copy Ask AI output_transport = webrtc_transport.output() ​ send_image async method Send an image frame to the client. Parameters: frame : The image frame to send (OutputImageRawFrame or SpriteFrame) Copy Ask AI await webrtc_transport.send_image(image_frame) ​ send_audio async method Send an audio frame to the client. Parameters: frame : The audio frame to send (OutputAudioRawFrame) Copy Ask AI await webrtc_transport.send_audio(audio_frame) ​ Event Handlers ​ on_app_message async callback Called when receiving application messages from the client. Parameters: message : The received message Copy Ask AI @webrtc_transport.event_handler ( "on_app_message" ) async def on_app_message ( message ): print ( f "Received message: { message } " ) ​ on_client_connected async callback Called when a client successfully connects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was established Copy Ask AI @webrtc_transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , webrtc_connection ): print ( f "Client connected" ) ​ on_client_disconnected async callback Called when a client disconnects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was disconnected Copy Ask AI @webrtc_transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , webrtc_connection ): print ( f "Client disconnected" ) ​ on_client_closed async callback Called when a client connection is closed. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was closed Copy Ask AI @webrtc_transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , webrtc_connection ): print ( f "Client connection closed" ) ​ Basic Usage This basic usage example shows the transport specific parts of a bot.py file required to configure your bot: Copy Ask AI from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline from pipecat.transports.base_transport import TransportParams from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection async def run_bot ( webrtc_connection ): # Create the WebRTC transport with the provided connection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , # Accept audio from the client audio_out_enabled = True , # Send audio to the client vad_analyzer = SileroVADAnalyzer(), ), ) # Set up your services and context # Create the pipeline pipeline = Pipeline([ transport.input(), # Receive audio from client stt, # Convert speech to text context_aggregator.user(), # Add user messages to context llm, # Process text with LLM tts, # Convert text to speech transport.output(), # Send audio responses to client context_aggregator.assistant(), # Add assistant responses to context ]) # Register event handlers @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( "Client connected" ) # Start the conversation when client connects await task.queue_frames([context_aggregator.user().get_context_frame()]) @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( "Client disconnected" ) @transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , client ): logger.info( "Client closed" ) await task.cancel() ​ How to connect with SmallWebRTCTransport For a client/server connection, you have two options for how to connect the client to the server: Use a Pipecat client SDK with the SmallWebRTCTransport . See the Client SDK docs to get started. Using the WebRTC API directly. This is only recommended for advanced use cases where the Pipecat client SDKs don’t have an available transport. ​ Examples To see a complete implementation, check out the following examples: Video Transform Demonstrates real-time video processing using WebRTC transport Voice Agent Implements a voice assistant using WebRTC for audio communication ​ Media Handling ​ Audio Audio is processed in 20ms chunks by default. The transport handles audio format conversion and resampling as needed: Input audio is processed at 16kHz (mono) to be compatible with speech recognition services Output audio can be configured to match your application’s requirements, but it must be mono, 16-bit PCM audio ​ Video Video is streamed using RGB format by default. The transport provides: Frame conversion between different color formats (RGB, YUV, etc.) Configurable resolution and framerate ​ WebRTC ICE Servers Configuration When implementing WebRTC in your project, STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers are usually needed in cases where users are behind routers or firewalls. In local networks (e.g., testing within the same home or office network), you usually don’t need to configure STUN or TURN servers. In such cases, WebRTC can often directly establish peer-to-peer connections without needing to traverse NAT or firewalls. ​ What are STUN and TURN Servers? STUN Server : Helps clients discover their public IP address and port when they’re behind a NAT (Network Address Translation) device (like a router). This allows WebRTC to attempt direct peer-to-peer communication by providing the public-facing IP and port. TURN Server : Used as a fallback when direct peer-to-peer communication isn’t possible due to strict NATs or firewalls blocking connections. The TURN server relays media traffic between peers. ​ Why are ICE Servers Important? ICE (Interactive Connectivity Establishment) is a framework used by WebRTC to handle network traversal and NAT issues. The iceServers configuration provides a list of STUN and TURN servers that WebRTC uses to find the best way to connect two peers. ​ Advanced Configuration ​ ICE Servers For better connectivity, especially when testing across different networks, you can provide STUN servers: Copy Ask AI webrtc_connection = SmallWebRTCConnection( ice_servers = [ "stun:stun.l.google.com:19302" , "stun:stun1.l.google.com:19302" ] ) You can also use IceServer objects for more advanced configuration: Copy Ask AI from pipecat.transports.network.webrtc_connection import IceServer webrtc_connection = SmallWebRTCConnection( ice_servers = [ IceServer( urls = "stun:stun.l.google.com:19302" ), IceServer( urls = "turn:turn.example.com:3478" , username = "username" , credential = "password" ) ] ) ​ Troubleshooting If clients have trouble connecting or streaming: Check browser console for WebRTC errors Ensure you’re using HTTPS in production (required for WebRTC) For testing across networks, consider using Daily which provides TURN servers Verify browser permissions for camera and microphone Daily WebRTC FastAPI WebSocket On this page Overview Installation Class Reference SmallWebRTCConnection Methods SmallWebRTCTransport Methods Event Handlers Basic Usage How to connect with SmallWebRTCTransport Examples Media Handling Audio Video WebRTC ICE Servers Configuration What are STUN and TURN Servers? Why are ICE Servers Important? Advanced Configuration ICE Servers Troubleshooting Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_small-webrtc_adcc4dbe.txt b/transport_small-webrtc_adcc4dbe.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2800276a4f381429f7459c7debd127aedcc85338
--- /dev/null
+++ b/transport_small-webrtc_adcc4dbe.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/small-webrtc#param-input-name
+Title: SmallWebRTCTransport - Pipecat
+==================================================
+
+SmallWebRTCTransport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport SmallWebRTCTransport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SmallWebRTCTransport enables peer-to-peer WebRTC connections between clients and your Pipecat application. It implements bidirectional audio and video streaming using WebRTC for real-time communication. This transport is intended for lightweight implementations, particularly for local development and testing. It expects your clients to include a corresponding SmallWebRTCTransport implementation. See here for the JavaScript implementation. SmallWebRTCTransport is best used for testing and development. For production deployments with scale, consider using the DailyTransport , as it has global, low-latency infrastructure. ​ Installation To use SmallWebRTCTransport , install the required dependencies: Copy Ask AI pip install pipecat-ai[webrtc] ​ Class Reference ​ SmallWebRTCConnection SmallWebRTCConnection manages the WebRTC connection details, peer connection state, and ICE candidates. It handles the signaling process and media tracks. Copy Ask AI SmallWebRTCConnection( ice_servers = None ) ​ ice_servers Union[List[str], List[IceServer]] List of STUN/TURN server URLs for ICE connection establishment. Can be provided as strings or as IceServer objects. ​ Methods ​ initialize async method Initialize the connection with a client’s SDP offer. Parameters: sdp : String containing the Session Description Protocol data from client’s offer type : String representing the SDP message type (typically “offer”) Copy Ask AI await webrtc_connection.initialize( sdp = client_sdp, type = "offer" ) ​ connect async method Establish the WebRTC peer connection after initialization. Copy Ask AI await webrtc_connection.connect() ​ close async method Close the WebRTC peer connection. Copy Ask AI await webrtc_connection.close() ​ disconnect async method Disconnect the WebRTC peer connection and send a peer left message to the client. Copy Ask AI await webrtc_connection.disconnect() ​ renegotiate async method Handle connection renegotiation requests. Parameters: sdp : String containing the Session Description Protocol data for renegotiation type : String representing the SDP message type restart_pc : Boolean indicating whether to completely restart the peer connection (default: False) Copy Ask AI await webrtc_connection.renegotiate( sdp = new_sdp, type = "offer" , restart_pc = False ) ​ get_answer method Retrieve the SDP answer to send back to the client. Returns a dictionary with sdp , type , and pc_id fields. Copy Ask AI answer = webrtc_connection.get_answer() # Returns: {"sdp": "...", "type": "answer", "pc_id": "..."} ​ send_app_message method Send an application message to the client. Parameters: message : The message to send (will be JSON serialized) Copy Ask AI webrtc_connection.send_app_message({ "action" : "greeting" , "text" : "Hello!" }) ​ is_connected method Check if the connection is active. Copy Ask AI if webrtc_connection.is_connected(): print ( "Connection is active" ) ​ audio_input_track method Get the audio input track from the client. Copy Ask AI audio_track = webrtc_connection.audio_input_track() ​ video_input_track method Get the video input track from the client. Copy Ask AI video_track = webrtc_connection.video_input_track() ​ replace_audio_track method Replace the current audio track with a new one. Parameters: track : The new audio track to use Copy Ask AI webrtc_connection.replace_audio_track(new_audio_track) ​ replace_video_track method Replace the current video track with a new one. Parameters: track : The new video track to use Copy Ask AI webrtc_connection.replace_video_track(new_video_track) ​ ask_to_renegotiate method Request the client to initiate connection renegotiation. Copy Ask AI webrtc_connection.ask_to_renegotiate() ​ event_handler decorator Register an event handler for connection events. Events: "app-message" : Called when a message is received from the client "track-started" : Called when a new track is started "track-ended" : Called when a track ends "connecting" : Called when connection is being established "connected" : Called when connection is established "disconnected" : Called when connection is lost "closed" : Called when connection is closed "failed" : Called when connection fails "new" : Called when a new connection is created Copy Ask AI @webrtc_connection.event_handler ( "connected" ) async def on_connected ( connection ): print ( f "WebRTC connection established" ) ​ SmallWebRTCTransport SmallWebRTCTransport is the main transport class that manages both input and output transports for WebRTC communication. Copy Ask AI SmallWebRTCTransport( webrtc_connection: SmallWebRTCConnection, params: TransportParams, input_name: Optional[ str ] = None , output_name: Optional[ str ] = None ) ​ webrtc_connection SmallWebRTCConnection required An instance of SmallWebRTCConnection that manages the WebRTC connection ​ params TransportParams required Configuration parameters for the transport Show TransportParams properties ​ audio_in_enabled bool default: false Enable audio input from the WebRTC client ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_out_enabled bool default: false Enable audio output to the WebRTC client ​ audio_in_sample_rate int Sample rate for incoming audio (Hz) ​ audio_out_sample_rate int Sample rate for outgoing audio (Hz) ​ audio_in_channels int default: 1 Number of audio input channels (1 for mono, 2 for stereo) ​ audio_out_channels int default: 1 Number of audio output channels (1 for mono, 2 for stereo) ​ video_in_enabled bool default: false Enable video input from the WebRTC client ​ video_out_enabled bool default: false Enable video output to the WebRTC client ​ video_out_width int default: 640 Width of outgoing video ​ video_out_height int default: 480 Height of outgoing video ​ video_out_framerate int default: 30 Framerate of outgoing video ​ vad_analyzer VADAnalyzer Custom VAD analyzer implementation ​ input_name str Optional name for the input transport ​ output_name str Optional name for the output transport ​ Methods ​ input method Returns the input transport instance. Copy Ask AI input_transport = webrtc_transport.input() ​ output method Returns the output transport instance. Copy Ask AI output_transport = webrtc_transport.output() ​ send_image async method Send an image frame to the client. Parameters: frame : The image frame to send (OutputImageRawFrame or SpriteFrame) Copy Ask AI await webrtc_transport.send_image(image_frame) ​ send_audio async method Send an audio frame to the client. Parameters: frame : The audio frame to send (OutputAudioRawFrame) Copy Ask AI await webrtc_transport.send_audio(audio_frame) ​ Event Handlers ​ on_app_message async callback Called when receiving application messages from the client. Parameters: message : The received message Copy Ask AI @webrtc_transport.event_handler ( "on_app_message" ) async def on_app_message ( message ): print ( f "Received message: { message } " ) ​ on_client_connected async callback Called when a client successfully connects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was established Copy Ask AI @webrtc_transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , webrtc_connection ): print ( f "Client connected" ) ​ on_client_disconnected async callback Called when a client disconnects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was disconnected Copy Ask AI @webrtc_transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , webrtc_connection ): print ( f "Client disconnected" ) ​ on_client_closed async callback Called when a client connection is closed. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was closed Copy Ask AI @webrtc_transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , webrtc_connection ): print ( f "Client connection closed" ) ​ Basic Usage This basic usage example shows the transport specific parts of a bot.py file required to configure your bot: Copy Ask AI from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline from pipecat.transports.base_transport import TransportParams from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection async def run_bot ( webrtc_connection ): # Create the WebRTC transport with the provided connection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , # Accept audio from the client audio_out_enabled = True , # Send audio to the client vad_analyzer = SileroVADAnalyzer(), ), ) # Set up your services and context # Create the pipeline pipeline = Pipeline([ transport.input(), # Receive audio from client stt, # Convert speech to text context_aggregator.user(), # Add user messages to context llm, # Process text with LLM tts, # Convert text to speech transport.output(), # Send audio responses to client context_aggregator.assistant(), # Add assistant responses to context ]) # Register event handlers @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( "Client connected" ) # Start the conversation when client connects await task.queue_frames([context_aggregator.user().get_context_frame()]) @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( "Client disconnected" ) @transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , client ): logger.info( "Client closed" ) await task.cancel() ​ How to connect with SmallWebRTCTransport For a client/server connection, you have two options for how to connect the client to the server: Use a Pipecat client SDK with the SmallWebRTCTransport . See the Client SDK docs to get started. Using the WebRTC API directly. This is only recommended for advanced use cases where the Pipecat client SDKs don’t have an available transport. ​ Examples To see a complete implementation, check out the following examples: Video Transform Demonstrates real-time video processing using WebRTC transport Voice Agent Implements a voice assistant using WebRTC for audio communication ​ Media Handling ​ Audio Audio is processed in 20ms chunks by default. The transport handles audio format conversion and resampling as needed: Input audio is processed at 16kHz (mono) to be compatible with speech recognition services Output audio can be configured to match your application’s requirements, but it must be mono, 16-bit PCM audio ​ Video Video is streamed using RGB format by default. The transport provides: Frame conversion between different color formats (RGB, YUV, etc.) Configurable resolution and framerate ​ WebRTC ICE Servers Configuration When implementing WebRTC in your project, STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers are usually needed in cases where users are behind routers or firewalls. In local networks (e.g., testing within the same home or office network), you usually don’t need to configure STUN or TURN servers. In such cases, WebRTC can often directly establish peer-to-peer connections without needing to traverse NAT or firewalls. ​ What are STUN and TURN Servers? STUN Server : Helps clients discover their public IP address and port when they’re behind a NAT (Network Address Translation) device (like a router). This allows WebRTC to attempt direct peer-to-peer communication by providing the public-facing IP and port. TURN Server : Used as a fallback when direct peer-to-peer communication isn’t possible due to strict NATs or firewalls blocking connections. The TURN server relays media traffic between peers. ​ Why are ICE Servers Important? ICE (Interactive Connectivity Establishment) is a framework used by WebRTC to handle network traversal and NAT issues. The iceServers configuration provides a list of STUN and TURN servers that WebRTC uses to find the best way to connect two peers. ​ Advanced Configuration ​ ICE Servers For better connectivity, especially when testing across different networks, you can provide STUN servers: Copy Ask AI webrtc_connection = SmallWebRTCConnection( ice_servers = [ "stun:stun.l.google.com:19302" , "stun:stun1.l.google.com:19302" ] ) You can also use IceServer objects for more advanced configuration: Copy Ask AI from pipecat.transports.network.webrtc_connection import IceServer webrtc_connection = SmallWebRTCConnection( ice_servers = [ IceServer( urls = "stun:stun.l.google.com:19302" ), IceServer( urls = "turn:turn.example.com:3478" , username = "username" , credential = "password" ) ] ) ​ Troubleshooting If clients have trouble connecting or streaming: Check browser console for WebRTC errors Ensure you’re using HTTPS in production (required for WebRTC) For testing across networks, consider using Daily which provides TURN servers Verify browser permissions for camera and microphone Daily WebRTC FastAPI WebSocket On this page Overview Installation Class Reference SmallWebRTCConnection Methods SmallWebRTCTransport Methods Event Handlers Basic Usage How to connect with SmallWebRTCTransport Examples Media Handling Audio Video WebRTC ICE Servers Configuration What are STUN and TURN Servers? Why are ICE Servers Important? Advanced Configuration ICE Servers Troubleshooting Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_small-webrtc_b427c68f.txt b/transport_small-webrtc_b427c68f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d79a5fcfee917b116a91fefdc70dafd140b84b95
--- /dev/null
+++ b/transport_small-webrtc_b427c68f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/small-webrtc#param-audio-out-channels
+Title: SmallWebRTCTransport - Pipecat
+==================================================
+
+SmallWebRTCTransport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport SmallWebRTCTransport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SmallWebRTCTransport enables peer-to-peer WebRTC connections between clients and your Pipecat application. It implements bidirectional audio and video streaming using WebRTC for real-time communication. This transport is intended for lightweight implementations, particularly for local development and testing. It expects your clients to include a corresponding SmallWebRTCTransport implementation. See here for the JavaScript implementation. SmallWebRTCTransport is best used for testing and development. For production deployments with scale, consider using the DailyTransport , as it has global, low-latency infrastructure. ​ Installation To use SmallWebRTCTransport , install the required dependencies: Copy Ask AI pip install pipecat-ai[webrtc] ​ Class Reference ​ SmallWebRTCConnection SmallWebRTCConnection manages the WebRTC connection details, peer connection state, and ICE candidates. It handles the signaling process and media tracks. Copy Ask AI SmallWebRTCConnection( ice_servers = None ) ​ ice_servers Union[List[str], List[IceServer]] List of STUN/TURN server URLs for ICE connection establishment. Can be provided as strings or as IceServer objects. ​ Methods ​ initialize async method Initialize the connection with a client’s SDP offer. Parameters: sdp : String containing the Session Description Protocol data from client’s offer type : String representing the SDP message type (typically “offer”) Copy Ask AI await webrtc_connection.initialize( sdp = client_sdp, type = "offer" ) ​ connect async method Establish the WebRTC peer connection after initialization. Copy Ask AI await webrtc_connection.connect() ​ close async method Close the WebRTC peer connection. Copy Ask AI await webrtc_connection.close() ​ disconnect async method Disconnect the WebRTC peer connection and send a peer left message to the client. Copy Ask AI await webrtc_connection.disconnect() ​ renegotiate async method Handle connection renegotiation requests. Parameters: sdp : String containing the Session Description Protocol data for renegotiation type : String representing the SDP message type restart_pc : Boolean indicating whether to completely restart the peer connection (default: False) Copy Ask AI await webrtc_connection.renegotiate( sdp = new_sdp, type = "offer" , restart_pc = False ) ​ get_answer method Retrieve the SDP answer to send back to the client. Returns a dictionary with sdp , type , and pc_id fields. Copy Ask AI answer = webrtc_connection.get_answer() # Returns: {"sdp": "...", "type": "answer", "pc_id": "..."} ​ send_app_message method Send an application message to the client. Parameters: message : The message to send (will be JSON serialized) Copy Ask AI webrtc_connection.send_app_message({ "action" : "greeting" , "text" : "Hello!" }) ​ is_connected method Check if the connection is active. Copy Ask AI if webrtc_connection.is_connected(): print ( "Connection is active" ) ​ audio_input_track method Get the audio input track from the client. Copy Ask AI audio_track = webrtc_connection.audio_input_track() ​ video_input_track method Get the video input track from the client. Copy Ask AI video_track = webrtc_connection.video_input_track() ​ replace_audio_track method Replace the current audio track with a new one. Parameters: track : The new audio track to use Copy Ask AI webrtc_connection.replace_audio_track(new_audio_track) ​ replace_video_track method Replace the current video track with a new one. Parameters: track : The new video track to use Copy Ask AI webrtc_connection.replace_video_track(new_video_track) ​ ask_to_renegotiate method Request the client to initiate connection renegotiation. Copy Ask AI webrtc_connection.ask_to_renegotiate() ​ event_handler decorator Register an event handler for connection events. Events: "app-message" : Called when a message is received from the client "track-started" : Called when a new track is started "track-ended" : Called when a track ends "connecting" : Called when connection is being established "connected" : Called when connection is established "disconnected" : Called when connection is lost "closed" : Called when connection is closed "failed" : Called when connection fails "new" : Called when a new connection is created Copy Ask AI @webrtc_connection.event_handler ( "connected" ) async def on_connected ( connection ): print ( f "WebRTC connection established" ) ​ SmallWebRTCTransport SmallWebRTCTransport is the main transport class that manages both input and output transports for WebRTC communication. Copy Ask AI SmallWebRTCTransport( webrtc_connection: SmallWebRTCConnection, params: TransportParams, input_name: Optional[ str ] = None , output_name: Optional[ str ] = None ) ​ webrtc_connection SmallWebRTCConnection required An instance of SmallWebRTCConnection that manages the WebRTC connection ​ params TransportParams required Configuration parameters for the transport Show TransportParams properties ​ audio_in_enabled bool default: false Enable audio input from the WebRTC client ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_out_enabled bool default: false Enable audio output to the WebRTC client ​ audio_in_sample_rate int Sample rate for incoming audio (Hz) ​ audio_out_sample_rate int Sample rate for outgoing audio (Hz) ​ audio_in_channels int default: 1 Number of audio input channels (1 for mono, 2 for stereo) ​ audio_out_channels int default: 1 Number of audio output channels (1 for mono, 2 for stereo) ​ video_in_enabled bool default: false Enable video input from the WebRTC client ​ video_out_enabled bool default: false Enable video output to the WebRTC client ​ video_out_width int default: 640 Width of outgoing video ​ video_out_height int default: 480 Height of outgoing video ​ video_out_framerate int default: 30 Framerate of outgoing video ​ vad_analyzer VADAnalyzer Custom VAD analyzer implementation ​ input_name str Optional name for the input transport ​ output_name str Optional name for the output transport ​ Methods ​ input method Returns the input transport instance. Copy Ask AI input_transport = webrtc_transport.input() ​ output method Returns the output transport instance. Copy Ask AI output_transport = webrtc_transport.output() ​ send_image async method Send an image frame to the client. Parameters: frame : The image frame to send (OutputImageRawFrame or SpriteFrame) Copy Ask AI await webrtc_transport.send_image(image_frame) ​ send_audio async method Send an audio frame to the client. Parameters: frame : The audio frame to send (OutputAudioRawFrame) Copy Ask AI await webrtc_transport.send_audio(audio_frame) ​ Event Handlers ​ on_app_message async callback Called when receiving application messages from the client. Parameters: message : The received message Copy Ask AI @webrtc_transport.event_handler ( "on_app_message" ) async def on_app_message ( message ): print ( f "Received message: { message } " ) ​ on_client_connected async callback Called when a client successfully connects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was established Copy Ask AI @webrtc_transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , webrtc_connection ): print ( f "Client connected" ) ​ on_client_disconnected async callback Called when a client disconnects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was disconnected Copy Ask AI @webrtc_transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , webrtc_connection ): print ( f "Client disconnected" ) ​ on_client_closed async callback Called when a client connection is closed. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was closed Copy Ask AI @webrtc_transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , webrtc_connection ): print ( f "Client connection closed" ) ​ Basic Usage This basic usage example shows the transport specific parts of a bot.py file required to configure your bot: Copy Ask AI from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline from pipecat.transports.base_transport import TransportParams from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection async def run_bot ( webrtc_connection ): # Create the WebRTC transport with the provided connection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , # Accept audio from the client audio_out_enabled = True , # Send audio to the client vad_analyzer = SileroVADAnalyzer(), ), ) # Set up your services and context # Create the pipeline pipeline = Pipeline([ transport.input(), # Receive audio from client stt, # Convert speech to text context_aggregator.user(), # Add user messages to context llm, # Process text with LLM tts, # Convert text to speech transport.output(), # Send audio responses to client context_aggregator.assistant(), # Add assistant responses to context ]) # Register event handlers @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( "Client connected" ) # Start the conversation when client connects await task.queue_frames([context_aggregator.user().get_context_frame()]) @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( "Client disconnected" ) @transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , client ): logger.info( "Client closed" ) await task.cancel() ​ How to connect with SmallWebRTCTransport For a client/server connection, you have two options for how to connect the client to the server: Use a Pipecat client SDK with the SmallWebRTCTransport . See the Client SDK docs to get started. Using the WebRTC API directly. This is only recommended for advanced use cases where the Pipecat client SDKs don’t have an available transport. ​ Examples To see a complete implementation, check out the following examples: Video Transform Demonstrates real-time video processing using WebRTC transport Voice Agent Implements a voice assistant using WebRTC for audio communication ​ Media Handling ​ Audio Audio is processed in 20ms chunks by default. The transport handles audio format conversion and resampling as needed: Input audio is processed at 16kHz (mono) to be compatible with speech recognition services Output audio can be configured to match your application’s requirements, but it must be mono, 16-bit PCM audio ​ Video Video is streamed using RGB format by default. The transport provides: Frame conversion between different color formats (RGB, YUV, etc.) Configurable resolution and framerate ​ WebRTC ICE Servers Configuration When implementing WebRTC in your project, STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers are usually needed in cases where users are behind routers or firewalls. In local networks (e.g., testing within the same home or office network), you usually don’t need to configure STUN or TURN servers. In such cases, WebRTC can often directly establish peer-to-peer connections without needing to traverse NAT or firewalls. ​ What are STUN and TURN Servers? STUN Server : Helps clients discover their public IP address and port when they’re behind a NAT (Network Address Translation) device (like a router). This allows WebRTC to attempt direct peer-to-peer communication by providing the public-facing IP and port. TURN Server : Used as a fallback when direct peer-to-peer communication isn’t possible due to strict NATs or firewalls blocking connections. The TURN server relays media traffic between peers. ​ Why are ICE Servers Important? ICE (Interactive Connectivity Establishment) is a framework used by WebRTC to handle network traversal and NAT issues. The iceServers configuration provides a list of STUN and TURN servers that WebRTC uses to find the best way to connect two peers. ​ Advanced Configuration ​ ICE Servers For better connectivity, especially when testing across different networks, you can provide STUN servers: Copy Ask AI webrtc_connection = SmallWebRTCConnection( ice_servers = [ "stun:stun.l.google.com:19302" , "stun:stun1.l.google.com:19302" ] ) You can also use IceServer objects for more advanced configuration: Copy Ask AI from pipecat.transports.network.webrtc_connection import IceServer webrtc_connection = SmallWebRTCConnection( ice_servers = [ IceServer( urls = "stun:stun.l.google.com:19302" ), IceServer( urls = "turn:turn.example.com:3478" , username = "username" , credential = "password" ) ] ) ​ Troubleshooting If clients have trouble connecting or streaming: Check browser console for WebRTC errors Ensure you’re using HTTPS in production (required for WebRTC) For testing across networks, consider using Daily which provides TURN servers Verify browser permissions for camera and microphone Daily WebRTC FastAPI WebSocket On this page Overview Installation Class Reference SmallWebRTCConnection Methods SmallWebRTCTransport Methods Event Handlers Basic Usage How to connect with SmallWebRTCTransport Examples Media Handling Audio Video WebRTC ICE Servers Configuration What are STUN and TURN Servers? Why are ICE Servers Important? Advanced Configuration ICE Servers Troubleshooting Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_small-webrtc_d2156a72.txt b/transport_small-webrtc_d2156a72.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4bae18df7c72de6a37fd574414853104761536e6
--- /dev/null
+++ b/transport_small-webrtc_d2156a72.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/small-webrtc#param-audio-out-sample-rate
+Title: SmallWebRTCTransport - Pipecat
+==================================================
+
+SmallWebRTCTransport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport SmallWebRTCTransport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SmallWebRTCTransport enables peer-to-peer WebRTC connections between clients and your Pipecat application. It implements bidirectional audio and video streaming using WebRTC for real-time communication. This transport is intended for lightweight implementations, particularly for local development and testing. It expects your clients to include a corresponding SmallWebRTCTransport implementation. See here for the JavaScript implementation. SmallWebRTCTransport is best used for testing and development. For production deployments with scale, consider using the DailyTransport , as it has global, low-latency infrastructure. ​ Installation To use SmallWebRTCTransport , install the required dependencies: Copy Ask AI pip install pipecat-ai[webrtc] ​ Class Reference ​ SmallWebRTCConnection SmallWebRTCConnection manages the WebRTC connection details, peer connection state, and ICE candidates. It handles the signaling process and media tracks. Copy Ask AI SmallWebRTCConnection( ice_servers = None ) ​ ice_servers Union[List[str], List[IceServer]] List of STUN/TURN server URLs for ICE connection establishment. Can be provided as strings or as IceServer objects. ​ Methods ​ initialize async method Initialize the connection with a client’s SDP offer. Parameters: sdp : String containing the Session Description Protocol data from client’s offer type : String representing the SDP message type (typically “offer”) Copy Ask AI await webrtc_connection.initialize( sdp = client_sdp, type = "offer" ) ​ connect async method Establish the WebRTC peer connection after initialization. Copy Ask AI await webrtc_connection.connect() ​ close async method Close the WebRTC peer connection. Copy Ask AI await webrtc_connection.close() ​ disconnect async method Disconnect the WebRTC peer connection and send a peer left message to the client. Copy Ask AI await webrtc_connection.disconnect() ​ renegotiate async method Handle connection renegotiation requests. Parameters: sdp : String containing the Session Description Protocol data for renegotiation type : String representing the SDP message type restart_pc : Boolean indicating whether to completely restart the peer connection (default: False) Copy Ask AI await webrtc_connection.renegotiate( sdp = new_sdp, type = "offer" , restart_pc = False ) ​ get_answer method Retrieve the SDP answer to send back to the client. Returns a dictionary with sdp , type , and pc_id fields. Copy Ask AI answer = webrtc_connection.get_answer() # Returns: {"sdp": "...", "type": "answer", "pc_id": "..."} ​ send_app_message method Send an application message to the client. Parameters: message : The message to send (will be JSON serialized) Copy Ask AI webrtc_connection.send_app_message({ "action" : "greeting" , "text" : "Hello!" }) ​ is_connected method Check if the connection is active. Copy Ask AI if webrtc_connection.is_connected(): print ( "Connection is active" ) ​ audio_input_track method Get the audio input track from the client. Copy Ask AI audio_track = webrtc_connection.audio_input_track() ​ video_input_track method Get the video input track from the client. Copy Ask AI video_track = webrtc_connection.video_input_track() ​ replace_audio_track method Replace the current audio track with a new one. Parameters: track : The new audio track to use Copy Ask AI webrtc_connection.replace_audio_track(new_audio_track) ​ replace_video_track method Replace the current video track with a new one. Parameters: track : The new video track to use Copy Ask AI webrtc_connection.replace_video_track(new_video_track) ​ ask_to_renegotiate method Request the client to initiate connection renegotiation. Copy Ask AI webrtc_connection.ask_to_renegotiate() ​ event_handler decorator Register an event handler for connection events. Events: "app-message" : Called when a message is received from the client "track-started" : Called when a new track is started "track-ended" : Called when a track ends "connecting" : Called when connection is being established "connected" : Called when connection is established "disconnected" : Called when connection is lost "closed" : Called when connection is closed "failed" : Called when connection fails "new" : Called when a new connection is created Copy Ask AI @webrtc_connection.event_handler ( "connected" ) async def on_connected ( connection ): print ( f "WebRTC connection established" ) ​ SmallWebRTCTransport SmallWebRTCTransport is the main transport class that manages both input and output transports for WebRTC communication. Copy Ask AI SmallWebRTCTransport( webrtc_connection: SmallWebRTCConnection, params: TransportParams, input_name: Optional[ str ] = None , output_name: Optional[ str ] = None ) ​ webrtc_connection SmallWebRTCConnection required An instance of SmallWebRTCConnection that manages the WebRTC connection ​ params TransportParams required Configuration parameters for the transport Show TransportParams properties ​ audio_in_enabled bool default: false Enable audio input from the WebRTC client ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_out_enabled bool default: false Enable audio output to the WebRTC client ​ audio_in_sample_rate int Sample rate for incoming audio (Hz) ​ audio_out_sample_rate int Sample rate for outgoing audio (Hz) ​ audio_in_channels int default: 1 Number of audio input channels (1 for mono, 2 for stereo) ​ audio_out_channels int default: 1 Number of audio output channels (1 for mono, 2 for stereo) ​ video_in_enabled bool default: false Enable video input from the WebRTC client ​ video_out_enabled bool default: false Enable video output to the WebRTC client ​ video_out_width int default: 640 Width of outgoing video ​ video_out_height int default: 480 Height of outgoing video ​ video_out_framerate int default: 30 Framerate of outgoing video ​ vad_analyzer VADAnalyzer Custom VAD analyzer implementation ​ input_name str Optional name for the input transport ​ output_name str Optional name for the output transport ​ Methods ​ input method Returns the input transport instance. Copy Ask AI input_transport = webrtc_transport.input() ​ output method Returns the output transport instance. Copy Ask AI output_transport = webrtc_transport.output() ​ send_image async method Send an image frame to the client. Parameters: frame : The image frame to send (OutputImageRawFrame or SpriteFrame) Copy Ask AI await webrtc_transport.send_image(image_frame) ​ send_audio async method Send an audio frame to the client. Parameters: frame : The audio frame to send (OutputAudioRawFrame) Copy Ask AI await webrtc_transport.send_audio(audio_frame) ​ Event Handlers ​ on_app_message async callback Called when receiving application messages from the client. Parameters: message : The received message Copy Ask AI @webrtc_transport.event_handler ( "on_app_message" ) async def on_app_message ( message ): print ( f "Received message: { message } " ) ​ on_client_connected async callback Called when a client successfully connects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was established Copy Ask AI @webrtc_transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , webrtc_connection ): print ( f "Client connected" ) ​ on_client_disconnected async callback Called when a client disconnects. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was disconnected Copy Ask AI @webrtc_transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , webrtc_connection ): print ( f "Client disconnected" ) ​ on_client_closed async callback Called when a client connection is closed. Parameters: transport : The SmallWebRTCTransport instance webrtc_connection : The connection that was closed Copy Ask AI @webrtc_transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , webrtc_connection ): print ( f "Client connection closed" ) ​ Basic Usage This basic usage example shows the transport specific parts of a bot.py file required to configure your bot: Copy Ask AI from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline from pipecat.transports.base_transport import TransportParams from pipecat.transports.network.small_webrtc import SmallWebRTCTransport from pipecat.transports.network.webrtc_connection import SmallWebRTCConnection async def run_bot ( webrtc_connection ): # Create the WebRTC transport with the provided connection transport = SmallWebRTCTransport( webrtc_connection = webrtc_connection, params = TransportParams( audio_in_enabled = True , # Accept audio from the client audio_out_enabled = True , # Send audio to the client vad_analyzer = SileroVADAnalyzer(), ), ) # Set up your services and context # Create the pipeline pipeline = Pipeline([ transport.input(), # Receive audio from client stt, # Convert speech to text context_aggregator.user(), # Add user messages to context llm, # Process text with LLM tts, # Convert text to speech transport.output(), # Send audio responses to client context_aggregator.assistant(), # Add assistant responses to context ]) # Register event handlers @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( "Client connected" ) # Start the conversation when client connects await task.queue_frames([context_aggregator.user().get_context_frame()]) @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( "Client disconnected" ) @transport.event_handler ( "on_client_closed" ) async def on_client_closed ( transport , client ): logger.info( "Client closed" ) await task.cancel() ​ How to connect with SmallWebRTCTransport For a client/server connection, you have two options for how to connect the client to the server: Use a Pipecat client SDK with the SmallWebRTCTransport . See the Client SDK docs to get started. Using the WebRTC API directly. This is only recommended for advanced use cases where the Pipecat client SDKs don’t have an available transport. ​ Examples To see a complete implementation, check out the following examples: Video Transform Demonstrates real-time video processing using WebRTC transport Voice Agent Implements a voice assistant using WebRTC for audio communication ​ Media Handling ​ Audio Audio is processed in 20ms chunks by default. The transport handles audio format conversion and resampling as needed: Input audio is processed at 16kHz (mono) to be compatible with speech recognition services Output audio can be configured to match your application’s requirements, but it must be mono, 16-bit PCM audio ​ Video Video is streamed using RGB format by default. The transport provides: Frame conversion between different color formats (RGB, YUV, etc.) Configurable resolution and framerate ​ WebRTC ICE Servers Configuration When implementing WebRTC in your project, STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers are usually needed in cases where users are behind routers or firewalls. In local networks (e.g., testing within the same home or office network), you usually don’t need to configure STUN or TURN servers. In such cases, WebRTC can often directly establish peer-to-peer connections without needing to traverse NAT or firewalls. ​ What are STUN and TURN Servers? STUN Server : Helps clients discover their public IP address and port when they’re behind a NAT (Network Address Translation) device (like a router). This allows WebRTC to attempt direct peer-to-peer communication by providing the public-facing IP and port. TURN Server : Used as a fallback when direct peer-to-peer communication isn’t possible due to strict NATs or firewalls blocking connections. The TURN server relays media traffic between peers. ​ Why are ICE Servers Important? ICE (Interactive Connectivity Establishment) is a framework used by WebRTC to handle network traversal and NAT issues. The iceServers configuration provides a list of STUN and TURN servers that WebRTC uses to find the best way to connect two peers. ​ Advanced Configuration ​ ICE Servers For better connectivity, especially when testing across different networks, you can provide STUN servers: Copy Ask AI webrtc_connection = SmallWebRTCConnection( ice_servers = [ "stun:stun.l.google.com:19302" , "stun:stun1.l.google.com:19302" ] ) You can also use IceServer objects for more advanced configuration: Copy Ask AI from pipecat.transports.network.webrtc_connection import IceServer webrtc_connection = SmallWebRTCConnection( ice_servers = [ IceServer( urls = "stun:stun.l.google.com:19302" ), IceServer( urls = "turn:turn.example.com:3478" , username = "username" , credential = "password" ) ] ) ​ Troubleshooting If clients have trouble connecting or streaming: Check browser console for WebRTC errors Ensure you’re using HTTPS in production (required for WebRTC) For testing across networks, consider using Daily which provides TURN servers Verify browser permissions for camera and microphone Daily WebRTC FastAPI WebSocket On this page Overview Installation Class Reference SmallWebRTCConnection Methods SmallWebRTCTransport Methods Event Handlers Basic Usage How to connect with SmallWebRTCTransport Examples Media Handling Audio Video WebRTC ICE Servers Configuration What are STUN and TURN Servers? Why are ICE Servers Important? Advanced Configuration ICE Servers Troubleshooting Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_websocket-server_38148f6c.txt b/transport_websocket-server_38148f6c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8de38a27923e952f9f52194638697ebddb83ad77
--- /dev/null
+++ b/transport_websocket-server_38148f6c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/websocket-server#event-callbacks
+Title: WebSocket Server - Pipecat
+==================================================
+
+WebSocket Server - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport WebSocket Server Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview WebsocketServerTransport provides a WebSocket server implementation for real-time audio communication. It supports bidirectional audio streams and voice activity detection (VAD). WebsocketServerTransport is best suited for server-side applications and prototyping client/server apps. For client/server production applications, we strongly recommend using a WebRTC-based transport for robust network and media handling. ​ Installation To use WebsocketServerTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[websocket]" ​ Configuration ​ Constructor Parameters ​ host str default: "localhost" Host address to bind the WebSocket server ​ port int default: "8765" Port number for the WebSocket server ​ params WebsocketServerParams default: "WebsocketServerParams()" Transport configuration parameters ​ WebsocketServerParams Configuration ​ add_wav_header bool default: "False" Add WAV header to audio frames ​ serializer FrameSerializer default: "ProtobufFrameSerializer()" Frame serializer for WebSocket messages ​ session_timeout int | None default: "None" Session timeout in seconds. If set, triggers timeout event when no activity is detected ​ Audio Configuration ​ audio_in_enabled bool default: false Enable audio input from the WebRTC client ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Basic Usage Copy Ask AI from pipecat.transports.network.websocket_server import ( WebsocketServerTransport, WebsocketServerParams ) from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline # Configure transport transport = WebsocketServerTransport( host = "localhost" , port = 8765 , params = WebsocketServerParams( audio_in_enabled = True , audio_out_enabled = True , add_wav_header = True , vad_analyzer = SileroVADAnalyzer(), session_timeout = 180 # 3 minutes ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio ]) Check out the Websocket Server example to see how to use this transport in a pipeline. ​ Event Callbacks WebsocketServerTransport provides callbacks for handling client connection events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_client_connected async callback Called when a client connects to the WebSocket server. Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( f "Client connected: { client.remote_address } " ) # Initialize conversation await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_client_disconnected async callback Called when a client disconnects from the WebSocket server. Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( f "Client disconnected: { client.remote_address } " ) ​ on_session_timeout async callback Called when a session times out (if session_timeout is configured). Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_session_timeout" ) async def on_session_timeout ( transport , client ): logger.info( f "Session timeout for client: { client.remote_address } " ) # Handle timeout (e.g., send message, close connection) ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from the WebSocket client ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent to the WebSocket client ​ Notes Supports real-time audio communication Best suited for server-side applications Handles WebSocket connection management Provides voice activity detection Supports session timeouts Single client per server (new connections replace existing ones) All callbacks are asynchronous FastAPI WebSocket Frame Serializer Overview On this page Overview Installation Configuration Constructor Parameters WebsocketServerParams Configuration Audio Configuration Voice Activity Detection (VAD) Basic Usage Event Callbacks Connection Events Frame Types Input Frames Output Frames Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transport_websocket-server_8b5db9cd.txt b/transport_websocket-server_8b5db9cd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..27703fdd24ffacbe5ea092ea7a8ce0a2217d7024
--- /dev/null
+++ b/transport_websocket-server_8b5db9cd.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/transport/websocket-server#param-on-client-disconnected
+Title: WebSocket Server - Pipecat
+==================================================
+
+WebSocket Server - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport WebSocket Server Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Daily WebRTC SmallWebRTCTransport FastAPI WebSocket WebSocket Server Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview WebsocketServerTransport provides a WebSocket server implementation for real-time audio communication. It supports bidirectional audio streams and voice activity detection (VAD). WebsocketServerTransport is best suited for server-side applications and prototyping client/server apps. For client/server production applications, we strongly recommend using a WebRTC-based transport for robust network and media handling. ​ Installation To use WebsocketServerTransport , install the required dependencies: Copy Ask AI pip install "pipecat-ai[websocket]" ​ Configuration ​ Constructor Parameters ​ host str default: "localhost" Host address to bind the WebSocket server ​ port int default: "8765" Port number for the WebSocket server ​ params WebsocketServerParams default: "WebsocketServerParams()" Transport configuration parameters ​ WebsocketServerParams Configuration ​ add_wav_header bool default: "False" Add WAV header to audio frames ​ serializer FrameSerializer default: "ProtobufFrameSerializer()" Frame serializer for WebSocket messages ​ session_timeout int | None default: "None" Session timeout in seconds. If set, triggers timeout event when no activity is detected ​ Audio Configuration ​ audio_in_enabled bool default: false Enable audio input from the WebRTC client ​ audio_in_passthrough bool default: "False" When enabled, incoming audio frames are pushed downstream ​ audio_out_enabled bool default: "False" Enable audio output capabilities ​ audio_out_sample_rate int default: "None" Audio output sample rate in Hz ​ audio_out_channels int default: "1" Number of audio output channels ​ Voice Activity Detection (VAD) ​ vad_analyzer VADAnalyzer | None default: "None" Voice Activity Detection analyzer. You can set this to either SileroVADAnalyzer() or WebRTCVADAnalyzer() . SileroVADAnalyzer is the recommended option. Learn more about the SileroVADAnalyzer . ​ Basic Usage Copy Ask AI from pipecat.transports.network.websocket_server import ( WebsocketServerTransport, WebsocketServerParams ) from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.pipeline.pipeline import Pipeline # Configure transport transport = WebsocketServerTransport( host = "localhost" , port = 8765 , params = WebsocketServerParams( audio_in_enabled = True , audio_out_enabled = True , add_wav_header = True , vad_analyzer = SileroVADAnalyzer(), session_timeout = 180 # 3 minutes ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), # Handle incoming audio stt, # Speech-to-text llm, # Language model tts, # Text-to-speech transport.output() # Handle outgoing audio ]) Check out the Websocket Server example to see how to use this transport in a pipeline. ​ Event Callbacks WebsocketServerTransport provides callbacks for handling client connection events. Register callbacks using the @transport.event_handler() decorator. ​ Connection Events ​ on_client_connected async callback Called when a client connects to the WebSocket server. Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_client_connected" ) async def on_client_connected ( transport , client ): logger.info( f "Client connected: { client.remote_address } " ) # Initialize conversation await task.queue_frames([LLMMessagesFrame(initial_messages)]) ​ on_client_disconnected async callback Called when a client disconnects from the WebSocket server. Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_client_disconnected" ) async def on_client_disconnected ( transport , client ): logger.info( f "Client disconnected: { client.remote_address } " ) ​ on_session_timeout async callback Called when a session times out (if session_timeout is configured). Parameters: transport : The WebsocketServerTransport instance client : WebSocket client connection object Copy Ask AI @transport.event_handler ( "on_session_timeout" ) async def on_session_timeout ( transport , client ): logger.info( f "Session timeout for client: { client.remote_address } " ) # Handle timeout (e.g., send message, close connection) ​ Frame Types ​ Input Frames ​ InputAudioRawFrame Frame Raw audio data from the WebSocket client ​ Output Frames ​ OutputAudioRawFrame Frame Audio data to be sent to the WebSocket client ​ Notes Supports real-time audio communication Best suited for server-side applications Handles WebSocket connection management Provides voice activity detection Supports session timeouts Single client per server (new connections replace existing ones) All callbacks are asynchronous FastAPI WebSocket Frame Serializer Overview On this page Overview Installation Configuration Constructor Parameters WebsocketServerParams Configuration Audio Configuration Voice Activity Detection (VAD) Basic Usage Event Callbacks Connection Events Frame Types Input Frames Output Frames Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transports_gemini-websocket_96fe59d3.txt b/transports_gemini-websocket_96fe59d3.txt
new file mode 100644
index 0000000000000000000000000000000000000000..cb651dd24099ba2e610b0a93d4e026b19c2cc55f
--- /dev/null
+++ b/transports_gemini-websocket_96fe59d3.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/ios/transports/gemini-websocket#api-reference
+Title: Gemini Live Websocket Transport - Pipecat
+==================================================
+
+Gemini Live Websocket Transport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport packages Gemini Live Websocket Transport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Daily WebRTC Transport Gemini Live Websocket Transport OpenAIRealTimeWebRTCTransport Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Gemini Live Websocket transport implementation enables real-time audio communication with the Gemini Multimodal Live service, using a direct websocket connection. Transports of this type are designed primarily for development and testing purposes. For production applications, you will need to build a server component with a server-friendly transport, like the DailyTransport , to securely handle API keys. ​ Installation Add the Gemini transport package to your project: Copy Ask AI . package ( url : "https://github.com/pipecat-ai/pipecat-client-ios-gemini-live-websocket.git" , from : "0.3.1" ), // Add to your target dependencies . target ( name : "YourApp" , dependencies : [ . product ( name : "PipecatClientIOSGeminiLiveWebSocket" , package : "pipecat-client-ios-gemini-live-websocket" ) ], ​ Usage Create a client: Copy Ask AI let options: RTVIClientOptions = . init ( params : . init ( config : [ . init ( service : "llm" , options : [ . init ( name : "api_key" , value : . string ( "<your Gemini api key>" )), . init ( name : "initial_messages" , value : . array ([ . object ([ "role" : . string ( "user" ), // "user" | "system" "content" : . string ( "I need your help planning my next vacation." ) ]) ])), . init ( name : "generation_config" , value : . object ([ "speech_config" : . object ([ "voice_config" : . object ([ "prebuilt_voice_config" : . object ([ "voice_name" : . string ( "Puck" ) // "Puck" | "Charon" | "Kore" | "Fenrir" | "Aoede" ]) ]) ]) ])) ] ) ]) ) let client = GeminiLiveWebSocketVoiceClient ( options : options) try await client. start () ​ API Reference Demo Simple Chatbot Gemini Demo Source iOS Gemini Live WebSocket Pipecat iOS Client Reference Complete API documentation for the Gemini transport implementation Daily WebRTC Transport OpenAIRealTimeWebRTCTransport On this page Installation Usage API Reference Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transports_openai-webrtc_bcf26f33.txt b/transports_openai-webrtc_bcf26f33.txt
new file mode 100644
index 0000000000000000000000000000000000000000..5f71424dc1f689dc27fa50e4b025fa19e4c27b9d
--- /dev/null
+++ b/transports_openai-webrtc_bcf26f33.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/ios/transports/openai-webrtc#api-reference
+Title: OpenAIRealTimeWebRTCTransport - Pipecat
+==================================================
+
+OpenAIRealTimeWebRTCTransport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport packages OpenAIRealTimeWebRTCTransport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Daily WebRTC Transport Gemini Live Websocket Transport OpenAIRealTimeWebRTCTransport Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport ​ Overview The OpenAI Realtime WebRTC transport implementation enables real-time audio communication directly with the OpenAI Realtime API using WebRTC voice-to-voice service. It handles media device management, audio/video streams, and state management for the connection. ​ Installation Add the OpenAI transport package to your project: Copy Ask AI . package ( url : "https://github.com/pipecat-ai/pipecat-client-ios-openai-realtime.git" , from : "0.0.1" ), // Add to your target dependencies . target ( name : "YourApp" , dependencies : [ . product ( name : "PipecatClientIOSOpenAIRealtimeWebrtc" , package : "pipecat-client-ios-openai-realtime" ) ], ​ Usage Create a client: Copy Ask AI let rtviClientOptions = RTVIClientOptions. init ( enableMic : currentSettings. enableMic , enableCam : false , params : . init ( config : [ . init ( service : "llm" , options : [ . init ( name : "api_key" , value : . string (openaiAPIKey)), . init ( name : "initial_messages" , value : . array ([ . object ([ "role" : . string ( "user" ), // "user" | "system" "content" : . string ( "Start by introducing yourself." ) ]) ])), . init ( name : "session_config" , value : . object ([ "instructions" : . string ( "You are Chatbot, a friendly and helpful assistant who provides useful information, including weather updates." ), "voice" : . string ( "echo" ), "input_audio_noise_reduction" : . object ([ "type" : . string ( "near_field" ) ]), "turn_detection" : . object ([ "type" : . string ( "semantic_vad" ) ]) ])), ] ) ]) ) self . rtviClientIOS = RTVIClient. init ( transport : OpenAIRealtimeTransport. init ( options : rtviClientOptions), options : rtviClientOptions ) try await rtviClientIOS. start () Currently, invalid session configurations will result in the OpenAI connection being failed. ​ API Reference Demo Simple Chatbot OpenAI Demo Source iOS OpenAI Realtime WebRTC Pipecat iOS Client Reference Complete API documentation for the OpenAI transport implementation Gemini Live Websocket Transport SDK Introduction On this page Overview Installation Usage API Reference Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/transports_openai-webrtc_cdfe81ad.txt b/transports_openai-webrtc_cdfe81ad.txt
new file mode 100644
index 0000000000000000000000000000000000000000..edb2171f5c9876427c181622b511a7c0374f4cb8
--- /dev/null
+++ b/transports_openai-webrtc_cdfe81ad.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/client/android/transports/openai-webrtc#installation
+Title: OpenAI Realtime WebRTC Transport - Pipecat
+==================================================
+
+OpenAI Realtime WebRTC Transport - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Transport packages OpenAI Realtime WebRTC Transport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages Daily WebRTC Transport Gemini Live Websocket Transport OpenAI Realtime WebRTC Transport Small WebRTC Transport C++ SDK SDK Introduction Daily WebRTC Transport The OpenAI Realtime WebRTC transport implementation enables real-time audio communication with the OpenAI Realtime service, using a direct WebRTC connection. ​ Installation Add the transport dependency to your build.gradle : Copy Ask AI implementation "ai.pipecat:openai-realtime-webrtc-transport:0.3.7" ​ Usage Create a client: Copy Ask AI val transport = OpenAIRealtimeWebRTCTransport. Factory (context) val options = RTVIClientOptions ( params = RTVIClientParams ( baseUrl = null , config = OpenAIRealtimeWebRTCTransport. buildConfig ( apiKey = apiKey, initialMessages = listOf ( LLMContextMessage (role = "user" , content = "How tall is the Eiffel Tower?" ) ), initialConfig = OpenAIRealtimeSessionConfig ( voice = "ballad" , turnDetection = Value. Object ( "type" to Value. Str ( "semantic_vad" )), inputAudioNoiseReduction = Value. Object ( "type" to Value. Str ( "near_field" )), inputAudioTranscription = Value. Object ( "model" to Value. Str ( "gpt-4o-transcribe" )) ) ) ) ) val client = RTVIClient (transport, callbacks, options) client. start (). withCallback { // ... } ​ Resources Demo Simple Chatbot Demo Source Client Transports Pipecat Android Client Reference Complete API documentation for the Pipecat Android client. Gemini Live Websocket Transport Small WebRTC Transport On this page Installation Usage Resources Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_aws_08b9103f.txt b/tts_aws_08b9103f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..831ab96c9283947c2fc2ca875a2c65ac0588495d
--- /dev/null
+++ b/tts_aws_08b9103f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/aws#param-aws-access-key-id
+Title: AWS Polly - Pipecat
+==================================================
+
+AWS Polly - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech AWS Polly Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AWSPollyTTSService provides text-to-speech capabilities using AWS’s Polly service. It supports multiple voices, languages, and speech customization options through SSML. The older PollyTTSService class is still available but has been deprecated. Use AWSPollyTTSService instead. ​ Installation To use AWSPollyTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) ​ Configuration ​ Constructor Parameters ​ api_key str AWS secret access key (can also use environment variable) ​ aws_access_key_id str AWS access key ID (can also use environment variable) ​ aws_session_token str AWS session token for temporary credentials (can also use environment variable) ​ region str AWS region name (defaults to “us-east-1” if not provided) ​ voice_id str default: "Joanna" AWS Polly voice identifier ​ sample_rate int default: "None" Output audio sample rate in Hz (resampled from Polly’s 16kHz) ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ params InputParams TTS configuration parameters ​ Input Parameters Copy Ask AI class InputParams ( BaseModel ): engine: Optional[ str ] = None # Polly engine type ("standard", "neural", or "generative") language: Optional[Language] = Language. EN pitch: Optional[ str ] = None # SSML pitch adjustment rate: Optional[ str ] = None # SSML rate adjustment volume: Optional[ str ] = None # SSML volume adjustment ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified (resampled from 16kHz) Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains AWS Polly error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports an extensive range of languages and regional variants: Language Code Description Service Code Language.AR Arabic arb Language.AR_AE Arabic (UAE) ar-AE Language.CA Catalan ca-ES Language.ZH Chinese (Mandarin) cmn-CN Language.YUE Chinese (Cantonese) yue-CN Language.YUE_CN Chinese (Cantonese) yue-CN Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.EN English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.EN_NZ English (New Zealand) en-NZ Language.EN_US English (US) en-US Language.EN_ZA English (South Africa) en-ZA Language.FI Finnish fi-FI Language.FR French fr-FR Language.FR_BE French (Belgium) fr-BE Language.FR_CA French (Canada) fr-CA Language.DE German de-DE Language.DE_AT German (Austria) de-AT Language.DE_CH German (Switzerland) de-CH Language.HI Hindi hi-IN Language.IS Icelandic is-IS Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.NO Norwegian nb-NO Language.NB Norwegian (Bokmål) nb-NO Language.NB_NO Norwegian (Bokmål) nb-NO Language.PL Polish pl-PL Language.PT Portuguese pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.PT_PT Portuguese (Portugal) pt-PT Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.ES Spanish es-ES Language.ES_MX Spanish (Mexico) es-MX Language.ES_US Spanish (US) es-US Language.SV Swedish sv-SE Language.TR Turkish tr-TR Language.CY Welsh cy-GB Language.CY_GB Welsh cy-GB ​ Usage Example Copy Ask AI from pipecat.services.aws.tts import AWSPollyTTSService from pipecat.transcriptions.language import Language # Configure service using environment variables for credentials tts = AWSPollyTTSService( region = "us-west-2" , voice_id = "Joanna" , params = AWSPollyTTSService.InputParams( engine = "neural" , language = Language. EN , rate = "+10%" , volume = "loud" ) ) # Or provide credentials directly tts = AWSPollyTTSService( aws_access_key_id = "YOUR_ACCESS_KEY_ID" , api_key = "YOUR_SECRET_ACCESS_KEY" , region = "us-west-2" , voice_id = "Joanna" , params = AWSPollyTTSService.InputParams( engine = "generative" , # For newer generative voices language = Language. EN , rate = "1.1" # Generative engine rate format ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ SSML Support The service automatically constructs SSML tags for advanced speech control: Copy Ask AI # Example with SSML controls service = AWSPollyTTSService( # ... other params ... params = AWSPollyTTSService.InputParams( engine = "neural" , rate = "+20%" , # Increase speed pitch = "low" , # Lower pitch volume = "loud" # Increase volume ) ) Prosody tags (pitch, rate, volume) have different behaviors based on the engine: - Standard engine: Supports all prosody tags - Neural engine: Full prosody support - Generative engine: Only rate is supported, with a different format (e.g., “1.1” for 10% faster) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls ​ Notes Supports all AWS Polly engines: Standard (non-neural voices) Neural (improved quality voices) Generative (high-quality, natural-sounding voices) Automatic audio resampling from 16kHz to any desired rate Thread-safe processing Automatic error handling Manages AWS client lifecycle Together AI Azure On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example SSML Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_aws_6516aa6b.txt b/tts_aws_6516aa6b.txt
new file mode 100644
index 0000000000000000000000000000000000000000..12e7faf6a1a79af2f6747371019cdc2629cb371f
--- /dev/null
+++ b/tts_aws_6516aa6b.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/aws#installation
+Title: AWS Polly - Pipecat
+==================================================
+
+AWS Polly - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech AWS Polly Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AWSPollyTTSService provides text-to-speech capabilities using AWS’s Polly service. It supports multiple voices, languages, and speech customization options through SSML. The older PollyTTSService class is still available but has been deprecated. Use AWSPollyTTSService instead. ​ Installation To use AWSPollyTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[aws]" You’ll also need to set up your AWS credentials as environment variables: AWS_SECRET_ACCESS_KEY AWS_ACCESS_KEY_ID AWS_SESSION_TOKEN (if using temporary credentials) AWS_REGION (defaults to “us-east-1”) ​ Configuration ​ Constructor Parameters ​ api_key str AWS secret access key (can also use environment variable) ​ aws_access_key_id str AWS access key ID (can also use environment variable) ​ aws_session_token str AWS session token for temporary credentials (can also use environment variable) ​ region str AWS region name (defaults to “us-east-1” if not provided) ​ voice_id str default: "Joanna" AWS Polly voice identifier ​ sample_rate int default: "None" Output audio sample rate in Hz (resampled from Polly’s 16kHz) ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ params InputParams TTS configuration parameters ​ Input Parameters Copy Ask AI class InputParams ( BaseModel ): engine: Optional[ str ] = None # Polly engine type ("standard", "neural", or "generative") language: Optional[Language] = Language. EN pitch: Optional[ str ] = None # SSML pitch adjustment rate: Optional[ str ] = None # SSML rate adjustment volume: Optional[ str ] = None # SSML volume adjustment ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified (resampled from 16kHz) Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains AWS Polly error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports an extensive range of languages and regional variants: Language Code Description Service Code Language.AR Arabic arb Language.AR_AE Arabic (UAE) ar-AE Language.CA Catalan ca-ES Language.ZH Chinese (Mandarin) cmn-CN Language.YUE Chinese (Cantonese) yue-CN Language.YUE_CN Chinese (Cantonese) yue-CN Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.EN English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.EN_NZ English (New Zealand) en-NZ Language.EN_US English (US) en-US Language.EN_ZA English (South Africa) en-ZA Language.FI Finnish fi-FI Language.FR French fr-FR Language.FR_BE French (Belgium) fr-BE Language.FR_CA French (Canada) fr-CA Language.DE German de-DE Language.DE_AT German (Austria) de-AT Language.DE_CH German (Switzerland) de-CH Language.HI Hindi hi-IN Language.IS Icelandic is-IS Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.NO Norwegian nb-NO Language.NB Norwegian (Bokmål) nb-NO Language.NB_NO Norwegian (Bokmål) nb-NO Language.PL Polish pl-PL Language.PT Portuguese pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.PT_PT Portuguese (Portugal) pt-PT Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.ES Spanish es-ES Language.ES_MX Spanish (Mexico) es-MX Language.ES_US Spanish (US) es-US Language.SV Swedish sv-SE Language.TR Turkish tr-TR Language.CY Welsh cy-GB Language.CY_GB Welsh cy-GB ​ Usage Example Copy Ask AI from pipecat.services.aws.tts import AWSPollyTTSService from pipecat.transcriptions.language import Language # Configure service using environment variables for credentials tts = AWSPollyTTSService( region = "us-west-2" , voice_id = "Joanna" , params = AWSPollyTTSService.InputParams( engine = "neural" , language = Language. EN , rate = "+10%" , volume = "loud" ) ) # Or provide credentials directly tts = AWSPollyTTSService( aws_access_key_id = "YOUR_ACCESS_KEY_ID" , api_key = "YOUR_SECRET_ACCESS_KEY" , region = "us-west-2" , voice_id = "Joanna" , params = AWSPollyTTSService.InputParams( engine = "generative" , # For newer generative voices language = Language. EN , rate = "1.1" # Generative engine rate format ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ SSML Support The service automatically constructs SSML tags for advanced speech control: Copy Ask AI # Example with SSML controls service = AWSPollyTTSService( # ... other params ... params = AWSPollyTTSService.InputParams( engine = "neural" , rate = "+20%" , # Increase speed pitch = "low" , # Lower pitch volume = "loud" # Increase volume ) ) Prosody tags (pitch, rate, volume) have different behaviors based on the engine: - Standard engine: Supports all prosody tags - Neural engine: Full prosody support - Generative engine: Only rate is supported, with a different format (e.g., “1.1” for 10% faster) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls ​ Notes Supports all AWS Polly engines: Standard (non-neural voices) Neural (improved quality voices) Generative (high-quality, natural-sounding voices) Automatic audio resampling from 16kHz to any desired rate Thread-safe processing Automatic error handling Manages AWS client lifecycle Together AI Azure On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example SSML Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_azure_579350ea.txt b/tts_azure_579350ea.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1b75526f435f6cbed75d13f96509052e0c1f5c81
--- /dev/null
+++ b/tts_azure_579350ea.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/azure#language-support
+Title: Azure - Pipecat
+==================================================
+
+Azure - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Azure Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview AzureTTSService provides high-quality text-to-speech synthesis using Azure’s Cognitive Services. It supports SSML for advanced voice control and multiple languages. ​ Installation To use AzureTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[azure]" You’ll also need to set up the following environment variables: AZURE_API_KEY AZURE_REGION ​ Configuration ​ Constructor Parameters ​ api_key str required Azure Speech Service API key ​ region str required Azure region identifier ​ voice str default: "en-US-SaraNeural" Voice identifier ​ sample_rate int default: "None" Output audio sample rate in Hz ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Input Parameters Copy Ask AI class InputParams ( BaseModel ): emphasis: Optional[ str ] language: Optional[Language] = Language. EN_US pitch: Optional[ str ] rate: Optional[ str ] = "1.05" role: Optional[ str ] style: Optional[ str ] style_degree: Optional[ str ] volume: Optional[ str ] ​ Supported Sample Rates 8000 Hz: Raw8Khz16BitMonoPcm 16000 Hz: Raw16Khz16BitMonoPcm 22050 Hz: Raw22050Hz16BitMonoPcm 24000 Hz: Raw24Khz16BitMonoPcm 44100 Hz: Raw44100Hz16BitMonoPcm 48000 Hz: Raw48Khz16BitMonoPcm ​ Usage Example Copy Ask AI # Configure service tts = AzureTTSService( api_key = "your-api-key" , region = "eastus" , voice = "en-US-JennyNeural" , params = AzureTTSService.InputParams( language = Language. EN_US , rate = "1.1" , style = "cheerful" ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Azure Speech Services support the following languages and regional variants: Language Code Description Service Code Language.BG Bulgarian bg-BG Language.CA Catalan ca-ES Language.ZH Chinese (Simplified) zh-CN Language.ZH_TW Chinese (Traditional) zh-TW Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch (Netherlands) nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.EN English (US) en-US Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_NZ English (New Zealand) en-NZ Language.EN_IN English (India) en-IN Language.ET Estonian et-EE Language.FI Finnish fi-FI Language.FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.DE German (Germany) de-DE Language.DE_CH German (Switzerland) de-CH Language.EL Greek el-GR Language.HI Hindi hi-IN Language.HU Hungarian hu-HU Language.ID Indonesian id-ID Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.LV Latvian lv-LV Language.LT Lithuanian lt-LT Language.MS Malay ms-MY Language.NO Norwegian nb-NO Language.PL Polish pl-PL Language.PT Portuguese (Portugal) pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.SK Slovak sk-SK Language.ES Spanish es-ES Language.SV Swedish sv-SE Language.TH Thai th-TH Language.TR Turkish tr-TR Language.UK Ukrainian uk-UA Language.VI Vietnamese vi-VN ​ Usage Examples ​ TTS Configuration Copy Ask AI # Configure TTS with specific language tts = AzureTTSService( api_key = "your-api-key" , region = "eastus" , params = AzureTTSService.InputParams( language = Language. FR_CA , # Canadian French voice = "fr-CA-SylvieNeural" ) ) ​ Regional Considerations Each language code includes both language and region (e.g., fr-FR for French in France) Some languages have multiple regional variants (e.g., English has US, UK, Australian, Indian, and New Zealand variants) Voice availability may vary by region and language Neural voices are available for most language/region combinations Some features (like custom pronunciation) may be limited to specific languages Note: Voice selection should match the specified language code for optimal results. Check Azure’s documentation for the latest list of available voices for each language/region combination. ​ SSML Support The service supports rich SSML customization: Copy Ask AI # Example with multiple SSML features params = AzureTTSService.InputParams( emphasis = "strong" , pitch = "+2st" , rate = "1.2" , style = "cheerful" , style_degree = "2" , volume = "loud" ) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls ​ Notes SSML-based speech customization Chunked audio delivery Thread-safe processing Automatic error handling Manages Azure client lifecycle AWS Polly Cartesia On this page Overview Installation Configuration Constructor Parameters Input Parameters Supported Sample Rates Usage Example Methods Language Support Usage Examples TTS Configuration Regional Considerations SSML Support Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_cartesia_6a86c766.txt b/tts_cartesia_6a86c766.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9c1cc83332388f4ad9d311445d0545615d3b2802
--- /dev/null
+++ b/tts_cartesia_6a86c766.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/cartesia#usage-example
+Title: Cartesia - Pipecat
+==================================================
+
+Cartesia - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Cartesia Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Cartesia provides two TTS service implementations: CartesiaTTSService (WebSocket-based with streaming and word timestamps) and CartesiaHttpTTSService (HTTP-based for simpler synthesis). The WebSocket service is recommended for real-time applications. API Reference Complete API documentation and method details Cartesia Docs Official Cartesia documentation and features Example Code Working example with interruption handling ​ Installation To use Cartesia services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[cartesia]" You’ll also need to set up your Cartesia API key as an environment variable: CARTESIA_API_KEY . Get your API key by signing up at Cartesia . ​ Frames ​ Input TextFrame - Text content to synthesize into speech TTSSpeakFrame - Text that the TTS service should speak TTSUpdateSettingsFrame - Runtime configuration updates (e.g., voice) LLMFullResponseStartFrame / LLMFullResponseEndFrame - LLM response boundaries ​ Output TTSStartedFrame - Signals start of synthesis TTSAudioRawFrame - Generated audio data chunks TTSStoppedFrame - Signals completion of synthesis ErrorFrame - Connection or processing errors ​ Service Comparison Feature CartesiaTTSService (WebSocket) CartesiaHttpTTSService (HTTP) Streaming ✅ Real-time chunks ❌ Single audio block Word Timestamps ✅ Precise timing ❌ Not available Interruption ✅ Advanced handling ⚠️ Basic support Latency 🚀 Low 📈 Higher Best For Interactive apps Batch processing ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German de Language.EN English en Language.ES Spanish es Language.FR French fr Language.HI Hindi hi Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.NL Dutch nl Language.PL Polish pl Language.PT Portuguese pt Language.RU Russian ru Language.SV Swedish sv Language.TR Turkish tr Language.ZH Chinese (Mandarin) zh ​ Usage Example ​ WebSocket Service (Recommended) Copy Ask AI from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.transcriptions.language import Language import os # Configure WebSocket service tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your-voice-id" , model = "sonic-2" , params = CartesiaTTSService.InputParams( language = Language. EN , speed = "normal" ) ) # Use in pipeline pipeline = Pipeline([ transport.input(), stt, llm, tts, # Word timestamps enable precise context updates transport.output() ]) ​ HTTP Service Copy Ask AI # For simpler, non-streaming use cases http_tts = CartesiaHttpTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your-voice-id" , model = "sonic-2" , params = CartesiaHttpTTSService.InputParams( language = Language. EN ) ) ​ Metrics Both services provide: Time to First Byte (TTFB) - Latency from text input to first audio Processing Duration - Total synthesis time Usage Metrics - Character count and synthesis statistics ​ Additional Notes WebSocket Recommended : Use CartesiaTTSService for low-latency streaming and accurate context updates with word timestamps Connection Management : WebSocket lifecycle is handled automatically with reconnection support Sample Rate : Set globally in PipelineParams rather than per-service for consistency Azure Deepgram On this page Overview Installation Frames Input Output Service Comparison Language Support Usage Example WebSocket Service (Recommended) HTTP Service Metrics Additional Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_deepgram_5c9e0789.txt b/tts_deepgram_5c9e0789.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f547840e22af169380f4ea57c3e7a4ab3dca0e8f
--- /dev/null
+++ b/tts_deepgram_5c9e0789.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/deepgram#notes
+Title: Deepgram - Pipecat
+==================================================
+
+Deepgram - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Deepgram Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DeepgramTTSService converts text to speech using Deepgram’s Aura API. It supports various voices and audio configurations. ​ Installation To use DeepgramTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[deepgram]" You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str required Your Deepgram API key ​ voice str default: "aura-helios-en" Voice identifier to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ encoding str default: "linear16" Audio encoding format ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Input The service accepts text input through its TTS pipeline. ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate (24kHz default) ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Deepgram TTS supports the following languages and regional variants: Language Code Description Service Codes Language.EN English en ​ Usage Example Copy Ask AI from pipecat.services.deepgram.tts import DeepgramTTSService # Configure service tts = DeepgramTTSService( api_key = "your-api-key" , voice = "aura-helios-en" , sample_rate = 24000 ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration ​ Audio Processing Streams audio in 8KB chunks Supports 16-bit PCM format Generates mono audio output Handles memory buffering ​ Notes Requires valid Deepgram API key Streams audio in chunks Supports various voices Provides metrics collection Handles memory efficiently Thread-safe processing Cartesia ElevenLabs On this page Overview Installation Configuration Constructor Parameters Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_deepgram_d85d04e3.txt b/tts_deepgram_d85d04e3.txt
new file mode 100644
index 0000000000000000000000000000000000000000..d7ebe61f3317a7baaa8df37583b26571fece33f9
--- /dev/null
+++ b/tts_deepgram_d85d04e3.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/deepgram#usage-example
+Title: Deepgram - Pipecat
+==================================================
+
+Deepgram - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Deepgram Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DeepgramTTSService converts text to speech using Deepgram’s Aura API. It supports various voices and audio configurations. ​ Installation To use DeepgramTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[deepgram]" You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str required Your Deepgram API key ​ voice str default: "aura-helios-en" Voice identifier to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ encoding str default: "linear16" Audio encoding format ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Input The service accepts text input through its TTS pipeline. ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate (24kHz default) ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Deepgram TTS supports the following languages and regional variants: Language Code Description Service Codes Language.EN English en ​ Usage Example Copy Ask AI from pipecat.services.deepgram.tts import DeepgramTTSService # Configure service tts = DeepgramTTSService( api_key = "your-api-key" , voice = "aura-helios-en" , sample_rate = 24000 ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration ​ Audio Processing Streams audio in 8KB chunks Supports 16-bit PCM format Generates mono audio output Handles memory buffering ​ Notes Requires valid Deepgram API key Streams audio in chunks Supports various voices Provides metrics collection Handles memory efficiently Thread-safe processing Cartesia ElevenLabs On this page Overview Installation Configuration Constructor Parameters Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_deepgram_e81e08fc.txt b/tts_deepgram_e81e08fc.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e44cdcaa84655960add57e3be11436af193be4b0
--- /dev/null
+++ b/tts_deepgram_e81e08fc.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/deepgram#param-sample-rate
+Title: Deepgram - Pipecat
+==================================================
+
+Deepgram - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Deepgram Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DeepgramTTSService converts text to speech using Deepgram’s Aura API. It supports various voices and audio configurations. ​ Installation To use DeepgramTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[deepgram]" You’ll also need to set up your Deepgram API key as an environment variable: DEEPGRAM_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str required Your Deepgram API key ​ voice str default: "aura-helios-en" Voice identifier to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ encoding str default: "linear16" Audio encoding format ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Input The service accepts text input through its TTS pipeline. ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate (24kHz default) ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Deepgram TTS supports the following languages and regional variants: Language Code Description Service Codes Language.EN English en ​ Usage Example Copy Ask AI from pipecat.services.deepgram.tts import DeepgramTTSService # Configure service tts = DeepgramTTSService( api_key = "your-api-key" , voice = "aura-helios-en" , sample_rate = 24000 ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration ​ Audio Processing Streams audio in 8KB chunks Supports 16-bit PCM format Generates mono audio output Handles memory buffering ​ Notes Requires valid Deepgram API key Streams audio in chunks Supports various voices Provides metrics collection Handles memory efficiently Thread-safe processing Cartesia ElevenLabs On this page Overview Installation Configuration Constructor Parameters Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_elevenlabs_1775af06.txt b/tts_elevenlabs_1775af06.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f0fad02345e8bb3a7bc8eabd9f693e5d1733b07a
--- /dev/null
+++ b/tts_elevenlabs_1775af06.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/elevenlabs#notes
+Title: ElevenLabs - Pipecat
+==================================================
+
+ElevenLabs - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech ElevenLabs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview ElevenLabs TTS provides high-quality text-to-speech synthesis through two service implementations: ElevenLabsTTSService : WebSocket-based implementation with word-level timing and interruption support ElevenLabsHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use ElevenLabsTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[elevenlabs]" You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY . You can obtain a ElevenLabs API key by signing up at ElevenLabs . ​ ElevenLabsTTSService (WebSocket) ​ Configuration ​ api_key str required ElevenLabs API key ​ voice_id str required Voice identifier ​ model str default: "eleven_flash_v2_5" Model identifier ​ url str default: "wss://api.elevenlabs.io" API endpoint URL ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ InputParams ​ language Language default: "None" The language of the text to be synthesized ​ optimize_streaming_latency str default: "None" Optimization level for streaming latency ​ stability float default: "None" Defines the stability for voice settings ​ similarity_boost float default: "None" Defines the similarity boost for voice settings ​ style float default: "None" Defines the style for voice settings. Available on V2+ models ​ use_speaker_boost bool default: "None" Defines whether to use speaker boost for voice settings. Available on V2+ models ​ speed float default: "None" Speech rate multiplier. Higher values increase speech speed ​ auto_mode bool default: "True" This parameter focuses on reducing the latency by disabling the chunk schedule and buffers. Recommended when sending full sentences or phrases ​ ElevenLabsHttpTTSService (HTTP) ​ Configuration ​ api_key str required ElevenLabs API key ​ voice_id str required Voice identifier ​ aiohttp_session aiohttp.ClientSession required aiohttp ClientSession for HTTP requests ​ model str default: "eleven_flash_v2_5" Model identifier ​ base_url str default: "https://api.elevenlabs.io" API base URL ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters (similar to WebSocket implementation) ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ ErrorFrame (HTTP implementation) Sent when an error occurs during HTTP TTS generation: ​ error str Error message describing what went wrong ​ Usage Examples ​ Basic Usage Copy Ask AI # Configure service tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , sample_rate = 24000 , params = ElevenLabsTTSService.InputParams( language = Language. EN ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output() ]) ​ With Voice Settings Copy Ask AI # Configure with voice customization tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( stability = 0.7 , similarity_boost = 0.8 , style = 0.5 , use_speaker_boost = True ) ) ​ Methods See the TTS base class methods for additional functionality. ​ Language Support ElevenLabs supports the following languages and their variants: Language Code Description Service Code Language.AR Arabic ar Language.BG Bulgarian bg Language.CS Czech cs Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.FI Finnish fi Language.FIL Filipino fil Language.FR French fr Language.HI Hindi hi Language.HR Croatian hr Language.HU Hungarian hu Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.MS Malay ms Language.NL Dutch nl Language.NO Norwegian no Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SK Slovak sk Language.SV Swedish sv Language.TA Tamil ta Language.TR Turkish tr Language.UK Ukrainian uk Language.VI Vietnamese vi Language.ZH Chinese zh Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details. ​ Usage Example Copy Ask AI # Configure service with specific language service = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( language = Language. FR # French ) ) ​ Frame Flow ​ Notes WebSocket implementation includes a 10-second keepalive mechanism Sample rate must be one of: 16000, 22050, 24000, or 44100 Hz Voice settings require both stability and similarity_boost to be set The language parameter only works with multilingual models WebSocket implementation pauses frame processing during speech generation HTTP implementation requires an external aiohttp ClientSession Deepgram Fish Audio On this page Overview Installation ElevenLabsTTSService (WebSocket) Configuration InputParams ElevenLabsHttpTTSService (HTTP) Configuration Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame (HTTP implementation) Usage Examples Basic Usage With Voice Settings Methods Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_elevenlabs_38242bfc.txt b/tts_elevenlabs_38242bfc.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c19d340075d98ebb7d832b2398ed886ef48b77b8
--- /dev/null
+++ b/tts_elevenlabs_38242bfc.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/elevenlabs#param-voice-id-1
+Title: ElevenLabs - Pipecat
+==================================================
+
+ElevenLabs - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech ElevenLabs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview ElevenLabs TTS provides high-quality text-to-speech synthesis through two service implementations: ElevenLabsTTSService : WebSocket-based implementation with word-level timing and interruption support ElevenLabsHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use ElevenLabsTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[elevenlabs]" You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY . You can obtain a ElevenLabs API key by signing up at ElevenLabs . ​ ElevenLabsTTSService (WebSocket) ​ Configuration ​ api_key str required ElevenLabs API key ​ voice_id str required Voice identifier ​ model str default: "eleven_flash_v2_5" Model identifier ​ url str default: "wss://api.elevenlabs.io" API endpoint URL ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ InputParams ​ language Language default: "None" The language of the text to be synthesized ​ optimize_streaming_latency str default: "None" Optimization level for streaming latency ​ stability float default: "None" Defines the stability for voice settings ​ similarity_boost float default: "None" Defines the similarity boost for voice settings ​ style float default: "None" Defines the style for voice settings. Available on V2+ models ​ use_speaker_boost bool default: "None" Defines whether to use speaker boost for voice settings. Available on V2+ models ​ speed float default: "None" Speech rate multiplier. Higher values increase speech speed ​ auto_mode bool default: "True" This parameter focuses on reducing the latency by disabling the chunk schedule and buffers. Recommended when sending full sentences or phrases ​ ElevenLabsHttpTTSService (HTTP) ​ Configuration ​ api_key str required ElevenLabs API key ​ voice_id str required Voice identifier ​ aiohttp_session aiohttp.ClientSession required aiohttp ClientSession for HTTP requests ​ model str default: "eleven_flash_v2_5" Model identifier ​ base_url str default: "https://api.elevenlabs.io" API base URL ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters (similar to WebSocket implementation) ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ ErrorFrame (HTTP implementation) Sent when an error occurs during HTTP TTS generation: ​ error str Error message describing what went wrong ​ Usage Examples ​ Basic Usage Copy Ask AI # Configure service tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , sample_rate = 24000 , params = ElevenLabsTTSService.InputParams( language = Language. EN ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output() ]) ​ With Voice Settings Copy Ask AI # Configure with voice customization tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( stability = 0.7 , similarity_boost = 0.8 , style = 0.5 , use_speaker_boost = True ) ) ​ Methods See the TTS base class methods for additional functionality. ​ Language Support ElevenLabs supports the following languages and their variants: Language Code Description Service Code Language.AR Arabic ar Language.BG Bulgarian bg Language.CS Czech cs Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.FI Finnish fi Language.FIL Filipino fil Language.FR French fr Language.HI Hindi hi Language.HR Croatian hr Language.HU Hungarian hu Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.MS Malay ms Language.NL Dutch nl Language.NO Norwegian no Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SK Slovak sk Language.SV Swedish sv Language.TA Tamil ta Language.TR Turkish tr Language.UK Ukrainian uk Language.VI Vietnamese vi Language.ZH Chinese zh Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details. ​ Usage Example Copy Ask AI # Configure service with specific language service = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( language = Language. FR # French ) ) ​ Frame Flow ​ Notes WebSocket implementation includes a 10-second keepalive mechanism Sample rate must be one of: 16000, 22050, 24000, or 44100 Hz Voice settings require both stability and similarity_boost to be set The language parameter only works with multilingual models WebSocket implementation pauses frame processing during speech generation HTTP implementation requires an external aiohttp ClientSession Deepgram Fish Audio On this page Overview Installation ElevenLabsTTSService (WebSocket) Configuration InputParams ElevenLabsHttpTTSService (HTTP) Configuration Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame (HTTP implementation) Usage Examples Basic Usage With Voice Settings Methods Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_elevenlabs_69dba97d.txt b/tts_elevenlabs_69dba97d.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f39f9201490321ab2d242a0b4f3adb3183001e09
--- /dev/null
+++ b/tts_elevenlabs_69dba97d.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/elevenlabs#installation
+Title: ElevenLabs - Pipecat
+==================================================
+
+ElevenLabs - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech ElevenLabs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview ElevenLabs TTS provides high-quality text-to-speech synthesis through two service implementations: ElevenLabsTTSService : WebSocket-based implementation with word-level timing and interruption support ElevenLabsHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use ElevenLabsTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[elevenlabs]" You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY . You can obtain a ElevenLabs API key by signing up at ElevenLabs . ​ ElevenLabsTTSService (WebSocket) ​ Configuration ​ api_key str required ElevenLabs API key ​ voice_id str required Voice identifier ​ model str default: "eleven_flash_v2_5" Model identifier ​ url str default: "wss://api.elevenlabs.io" API endpoint URL ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ InputParams ​ language Language default: "None" The language of the text to be synthesized ​ optimize_streaming_latency str default: "None" Optimization level for streaming latency ​ stability float default: "None" Defines the stability for voice settings ​ similarity_boost float default: "None" Defines the similarity boost for voice settings ​ style float default: "None" Defines the style for voice settings. Available on V2+ models ​ use_speaker_boost bool default: "None" Defines whether to use speaker boost for voice settings. Available on V2+ models ​ speed float default: "None" Speech rate multiplier. Higher values increase speech speed ​ auto_mode bool default: "True" This parameter focuses on reducing the latency by disabling the chunk schedule and buffers. Recommended when sending full sentences or phrases ​ ElevenLabsHttpTTSService (HTTP) ​ Configuration ​ api_key str required ElevenLabs API key ​ voice_id str required Voice identifier ​ aiohttp_session aiohttp.ClientSession required aiohttp ClientSession for HTTP requests ​ model str default: "eleven_flash_v2_5" Model identifier ​ base_url str default: "https://api.elevenlabs.io" API base URL ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters (similar to WebSocket implementation) ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ ErrorFrame (HTTP implementation) Sent when an error occurs during HTTP TTS generation: ​ error str Error message describing what went wrong ​ Usage Examples ​ Basic Usage Copy Ask AI # Configure service tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , sample_rate = 24000 , params = ElevenLabsTTSService.InputParams( language = Language. EN ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output() ]) ​ With Voice Settings Copy Ask AI # Configure with voice customization tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( stability = 0.7 , similarity_boost = 0.8 , style = 0.5 , use_speaker_boost = True ) ) ​ Methods See the TTS base class methods for additional functionality. ​ Language Support ElevenLabs supports the following languages and their variants: Language Code Description Service Code Language.AR Arabic ar Language.BG Bulgarian bg Language.CS Czech cs Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.FI Finnish fi Language.FIL Filipino fil Language.FR French fr Language.HI Hindi hi Language.HR Croatian hr Language.HU Hungarian hu Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.MS Malay ms Language.NL Dutch nl Language.NO Norwegian no Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SK Slovak sk Language.SV Swedish sv Language.TA Tamil ta Language.TR Turkish tr Language.UK Ukrainian uk Language.VI Vietnamese vi Language.ZH Chinese zh Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details. ​ Usage Example Copy Ask AI # Configure service with specific language service = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( language = Language. FR # French ) ) ​ Frame Flow ​ Notes WebSocket implementation includes a 10-second keepalive mechanism Sample rate must be one of: 16000, 22050, 24000, or 44100 Hz Voice settings require both stability and similarity_boost to be set The language parameter only works with multilingual models WebSocket implementation pauses frame processing during speech generation HTTP implementation requires an external aiohttp ClientSession Deepgram Fish Audio On this page Overview Installation ElevenLabsTTSService (WebSocket) Configuration InputParams ElevenLabsHttpTTSService (HTTP) Configuration Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame (HTTP implementation) Usage Examples Basic Usage With Voice Settings Methods Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_elevenlabs_ea163973.txt b/tts_elevenlabs_ea163973.txt
new file mode 100644
index 0000000000000000000000000000000000000000..af065921811397fcdf0456377cb249c55f865165
--- /dev/null
+++ b/tts_elevenlabs_ea163973.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/elevenlabs#param-model
+Title: ElevenLabs - Pipecat
+==================================================
+
+ElevenLabs - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech ElevenLabs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview ElevenLabs TTS provides high-quality text-to-speech synthesis through two service implementations: ElevenLabsTTSService : WebSocket-based implementation with word-level timing and interruption support ElevenLabsHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use ElevenLabsTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[elevenlabs]" You’ll also need to set up your ElevenLabs API key as an environment variable: ELEVENLABS_API_KEY . You can obtain a ElevenLabs API key by signing up at ElevenLabs . ​ ElevenLabsTTSService (WebSocket) ​ Configuration ​ api_key str required ElevenLabs API key ​ voice_id str required Voice identifier ​ model str default: "eleven_flash_v2_5" Model identifier ​ url str default: "wss://api.elevenlabs.io" API endpoint URL ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ InputParams ​ language Language default: "None" The language of the text to be synthesized ​ optimize_streaming_latency str default: "None" Optimization level for streaming latency ​ stability float default: "None" Defines the stability for voice settings ​ similarity_boost float default: "None" Defines the similarity boost for voice settings ​ style float default: "None" Defines the style for voice settings. Available on V2+ models ​ use_speaker_boost bool default: "None" Defines whether to use speaker boost for voice settings. Available on V2+ models ​ speed float default: "None" Speech rate multiplier. Higher values increase speech speed ​ auto_mode bool default: "True" This parameter focuses on reducing the latency by disabling the chunk schedule and buffers. Recommended when sending full sentences or phrases ​ ElevenLabsHttpTTSService (HTTP) ​ Configuration ​ api_key str required ElevenLabs API key ​ voice_id str required Voice identifier ​ aiohttp_session aiohttp.ClientSession required aiohttp ClientSession for HTTP requests ​ model str default: "eleven_flash_v2_5" Model identifier ​ base_url str default: "https://api.elevenlabs.io" API base URL ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Additional configuration parameters (similar to WebSocket implementation) ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ ErrorFrame (HTTP implementation) Sent when an error occurs during HTTP TTS generation: ​ error str Error message describing what went wrong ​ Usage Examples ​ Basic Usage Copy Ask AI # Configure service tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , sample_rate = 24000 , params = ElevenLabsTTSService.InputParams( language = Language. EN ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output() ]) ​ With Voice Settings Copy Ask AI # Configure with voice customization tts = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( stability = 0.7 , similarity_boost = 0.8 , style = 0.5 , use_speaker_boost = True ) ) ​ Methods See the TTS base class methods for additional functionality. ​ Language Support ElevenLabs supports the following languages and their variants: Language Code Description Service Code Language.AR Arabic ar Language.BG Bulgarian bg Language.CS Czech cs Language.DA Danish da Language.DE German de Language.EL Greek el Language.EN English en Language.ES Spanish es Language.FI Finnish fi Language.FIL Filipino fil Language.FR French fr Language.HI Hindi hi Language.HR Croatian hr Language.HU Hungarian hu Language.ID Indonesian id Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.MS Malay ms Language.NL Dutch nl Language.NO Norwegian no Language.PL Polish pl Language.PT Portuguese pt Language.RO Romanian ro Language.RU Russian ru Language.SK Slovak sk Language.SV Swedish sv Language.TA Tamil ta Language.TR Turkish tr Language.UK Ukrainian uk Language.VI Vietnamese vi Language.ZH Chinese zh Note: Language support may vary based on the selected model. See the ElevenLabs docs for more details. ​ Usage Example Copy Ask AI # Configure service with specific language service = ElevenLabsTTSService( api_key = "your-api-key" , voice_id = "voice-id" , params = ElevenLabsTTSService.InputParams( language = Language. FR # French ) ) ​ Frame Flow ​ Notes WebSocket implementation includes a 10-second keepalive mechanism Sample rate must be one of: 16000, 22050, 24000, or 44100 Hz Voice settings require both stability and similarity_boost to be set The language parameter only works with multilingual models WebSocket implementation pauses frame processing during speech generation HTTP implementation requires an external aiohttp ClientSession Deepgram Fish Audio On this page Overview Installation ElevenLabsTTSService (WebSocket) Configuration InputParams ElevenLabsHttpTTSService (HTTP) Configuration Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame (HTTP implementation) Usage Examples Basic Usage With Voice Settings Methods Language Support Usage Example Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_fish_21045527.txt b/tts_fish_21045527.txt
new file mode 100644
index 0000000000000000000000000000000000000000..496a17d035be88a41d6b78f4d6f37cb6b29d64b4
--- /dev/null
+++ b/tts_fish_21045527.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/fish#basic-usage
+Title: Fish Audio - Pipecat
+==================================================
+
+Fish Audio - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Fish Audio Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The FishAudioTTSService provides real-time text-to-speech synthesis using Fish Audio’s WebSocket API. It supports streaming audio output, multiple voices, and various audio formats. ​ Installation To use Fish Audio, install the required dependencies: Copy Ask AI pip install "pipecat-ai[fish]" You’ll need to set up your Fish Audio API key as an environment variable: FISH_API_KEY . ​ Constructor Parameters ​ api_key str required Fish Audio API key ​ model str required Reference ID for the voice model ​ output_format str default: "pcm" Audio output format. Options: “opus”, “mp3”, “pcm”, “wav” ​ sample_rate int default: "None" Output audio sample rate in Hz ​ Basic Usage Copy Ask AI tts = FishAudioTTSService( api_key = os.getenv( "FISH_API_KEY" ), model = "your-model-id" , # Get this from Fish Audio playground output_format = "pcm" , # Choose output format sample_rate = 24000 , # Set sample rate params = FishAudioTTSService.InputParams( latency = "normal" , prosody_speed = 1.0 ) ) ​ Input Parameters ​ language Language default: "Language.EN" Language for speech synthesis. See Language Support section for available options. ​ latency str default: "normal" Latency mode for synthesis. Options: “normal” or “balanced” ​ prosody_speed float default: "1.0" Speech speed adjustment. Range: 0.5 to 2.0 ​ prosody_volume int default: "0" Volume adjustment in decibels (dB) Copy Ask AI tts = FishAudioTTSService( api_key = os.getenv( "FISH_API_KEY" ), model = "your-model-id" , params = InputParams( language = Language. EN , latency = "normal" , # Balance between quality and speed prosody_speed = 1.2 , # Slightly faster speech prosody_volume = 0 # Default volume ) ) ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of synthesis ​ TTSStoppedFrame Frame Signals completion of synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: - Specified format (PCM, WAV, MP3, or Opus) - Configured sample rate - Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Fish Audio error information ​ Language Support Supports multiple languages through the Language enum: Language Code Service Code Language.EN en-US Language.ZH zh-CN ​ Usage Example Copy Ask AI from pipecat.services.fish.tts import FishAudioTTSService from pipecat.transcriptions.language import Language # Configure service tts = FishAudioTTSService( api_key = os.getenv( "FISH_API_KEY" ), model = "e58b0d7efca34eb38d5c4985e378abcb" , # Example model ID output_format = "pcm" , params = FishAudioTTSService.InputParams( language = Language. EN , latency = "normal" , prosody_speed = 1.0 , prosody_volume = 0 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls ElevenLabs Google On this page Overview Installation Constructor Parameters Basic Usage Input Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Example Frame Flow Metrics Support Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_fish_38ee2ab2.txt b/tts_fish_38ee2ab2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c72c2f74b12d8f8f050e97134d3077bc5de71da1
--- /dev/null
+++ b/tts_fish_38ee2ab2.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/fish#installation
+Title: Fish Audio - Pipecat
+==================================================
+
+Fish Audio - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Fish Audio Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The FishAudioTTSService provides real-time text-to-speech synthesis using Fish Audio’s WebSocket API. It supports streaming audio output, multiple voices, and various audio formats. ​ Installation To use Fish Audio, install the required dependencies: Copy Ask AI pip install "pipecat-ai[fish]" You’ll need to set up your Fish Audio API key as an environment variable: FISH_API_KEY . ​ Constructor Parameters ​ api_key str required Fish Audio API key ​ model str required Reference ID for the voice model ​ output_format str default: "pcm" Audio output format. Options: “opus”, “mp3”, “pcm”, “wav” ​ sample_rate int default: "None" Output audio sample rate in Hz ​ Basic Usage Copy Ask AI tts = FishAudioTTSService( api_key = os.getenv( "FISH_API_KEY" ), model = "your-model-id" , # Get this from Fish Audio playground output_format = "pcm" , # Choose output format sample_rate = 24000 , # Set sample rate params = FishAudioTTSService.InputParams( latency = "normal" , prosody_speed = 1.0 ) ) ​ Input Parameters ​ language Language default: "Language.EN" Language for speech synthesis. See Language Support section for available options. ​ latency str default: "normal" Latency mode for synthesis. Options: “normal” or “balanced” ​ prosody_speed float default: "1.0" Speech speed adjustment. Range: 0.5 to 2.0 ​ prosody_volume int default: "0" Volume adjustment in decibels (dB) Copy Ask AI tts = FishAudioTTSService( api_key = os.getenv( "FISH_API_KEY" ), model = "your-model-id" , params = InputParams( language = Language. EN , latency = "normal" , # Balance between quality and speed prosody_speed = 1.2 , # Slightly faster speech prosody_volume = 0 # Default volume ) ) ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of synthesis ​ TTSStoppedFrame Frame Signals completion of synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: - Specified format (PCM, WAV, MP3, or Opus) - Configured sample rate - Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Fish Audio error information ​ Language Support Supports multiple languages through the Language enum: Language Code Service Code Language.EN en-US Language.ZH zh-CN ​ Usage Example Copy Ask AI from pipecat.services.fish.tts import FishAudioTTSService from pipecat.transcriptions.language import Language # Configure service tts = FishAudioTTSService( api_key = os.getenv( "FISH_API_KEY" ), model = "e58b0d7efca34eb38d5c4985e378abcb" , # Example model ID output_format = "pcm" , params = FishAudioTTSService.InputParams( language = Language. EN , latency = "normal" , prosody_speed = 1.0 , prosody_volume = 0 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls ElevenLabs Google On this page Overview Installation Constructor Parameters Basic Usage Input Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Example Frame Flow Metrics Support Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_google_56427fba.txt b/tts_google_56427fba.txt
new file mode 100644
index 0000000000000000000000000000000000000000..197b089cbc0180aee2682b295279c5669c2d589a
--- /dev/null
+++ b/tts_google_56427fba.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/google#usage-examples
+Title: Google - Pipecat
+==================================================
+
+Google - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Google Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GoogleTTSService provides high-quality text-to-speech synthesis using Google Cloud’s Text-to-Speech API. It supports SSML for advanced voice control and multiple languages. ​ Installation To use GoogleTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[google]" You’ll also need to set up Google Cloud credentials through either: Environment variable: GOOGLE_APPLICATION_CREDENTIALS Direct credentials JSON Credentials file path ​ Configuration ​ Constructor Parameters ​ credentials str | None Google Cloud credentials JSON string ​ credentials_path str | None Path to credentials JSON file ​ voice_id str default: "en-US-Neural2-A" Voice identifier ​ sample_rate int default: "None" Output audio sample rate in Hz ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Input Parameters Copy Ask AI class InputParams ( BaseModel ): pitch: Optional[ str ] rate: Optional[ str ] volume: Optional[ str ] emphasis: Optional[Literal[ "strong" , "moderate" , "reduced" , "none" ]] language: Optional[Language] = Language. EN gender: Optional[Literal[ "male" , "female" , "neutral" ]] google_style: Optional[Literal[ "apologetic" , "calm" , "empathetic" , "firm" , "lively" ]] ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of synthesis ​ TTSStoppedFrame Frame Signals completion of synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: - PCM encoded audio - Configured sample rate - Mono channel ​ Error Frames ​ ErrorFrame Frame Contains error information ​ Usage Examples ​ Basic Usage Copy Ask AI # Configure service tts = GoogleTTSService( credentials_path = "path/to/credentials.json" , voice_id = "en-US-Neural2-A" , params = GoogleTTSService.InputParams( language = Language. EN , gender = "female" , google_style = "empathetic" ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ With SSML Controls Copy Ask AI # Configure with voice controls service = GoogleTTSService( credentials = credentials_json, params = GoogleTTSService.InputParams( pitch = "+2st" , rate = "1.2" , volume = "loud" , emphasis = "moderate" ) ) ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Google Cloud Text-to-Speech supports the following languages and regional variants: Language Code Description Service Code Language.BG Bulgarian bg-BG Language.CA Catalan ca-ES Language.ZH Chinese (Mandarin) cmn-CN Language.ZH_TW Chinese (Taiwan) cmn-TW Language.CS Czech cs-CZ Language.DA Danish da-DK Language.NL Dutch (Netherlands) nl-NL Language.NL_BE Dutch (Belgium) nl-BE Language.EN English (US) en-US Language.EN_US English (US) en-US Language.EN_AU English (Australia) en-AU Language.EN_GB English (UK) en-GB Language.EN_IN English (India) en-IN Language.ET Estonian et-EE Language.FI Finnish fi-FI Language.FR French (France) fr-FR Language.FR_CA French (Canada) fr-CA Language.DE German de-DE Language.EL Greek el-GR Language.HI Hindi hi-IN Language.HU Hungarian hu-HU Language.ID Indonesian id-ID Language.IT Italian it-IT Language.JA Japanese ja-JP Language.KO Korean ko-KR Language.LV Latvian lv-LV Language.LT Lithuanian lt-LT Language.MS Malay ms-MY Language.NO Norwegian nb-NO Language.PL Polish pl-PL Language.PT Portuguese (Portugal) pt-PT Language.PT_BR Portuguese (Brazil) pt-BR Language.RO Romanian ro-RO Language.RU Russian ru-RU Language.SK Slovak sk-SK Language.ES Spanish es-ES Language.SV Swedish sv-SE Language.TH Thai th-TH Language.TR Turkish tr-TR Language.UK Ukrainian uk-UA Language.VI Vietnamese vi-VN ​ Usage Example Copy Ask AI # Configure service with specific language and region service = GoogleTTSService( credentials_path = "path/to/credentials.json" , voice_id = "en-US-Neural2-A" , params = GoogleTTSService.InputParams( language = Language. EN_GB , # British English gender = "female" ) ) ​ Regional Considerations Each language code includes both language and region (e.g., fr-FR for French in France) Some languages have multiple regional variants (e.g., English has US, UK, Australian, and Indian variants) Voice availability may vary by region Neural voices may not be available for all language/region combinations Note: Voice selection should match the specified language code for optimal results. ​ Frame Flow ​ Notes Supports SSML markup Multiple voice styles Gender selection Prosody control Emphasis levels Regional language variants Metrics collection Chunked audio output Thread-safe processing Fish Audio Groq On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Usage Examples Basic Usage With SSML Controls Methods Language Support Usage Example Regional Considerations Frame Flow Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_groq_0290c54f.txt b/tts_groq_0290c54f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4d5bbf99ea4903cc7700eda29117971003973d82
--- /dev/null
+++ b/tts_groq_0290c54f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/groq#ttsstartedframe
+Title: Groq - Pipecat
+==================================================
+
+Groq - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Groq Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GroqTTSService converts text to speech using Groq’s TTS API. It supports real-time audio generation with multiple voices. ​ Installation To use GroqTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[groq]" You’ll also need to set up your Groq API key as an environment variable: GROQ_API_KEY . You can obtain a Groq Cloud API key by signing up at Groq . ​ Configuration ​ Constructor Parameters ​ api_key str required Your Groq API key ​ output_format str default: "wav" Audio output format ​ params InputParams default: "InputParams()" Configuration parameters for speech generation ​ model_name str default: "playai-tts" TTS model to use. See the Groq Cloud docs for available models . ​ voice_id str default: "Celeste-PlayAI" Voice identifier to use for synthesis ​ Input Parameters ​ language Language default: "Language.EN" Language for speech synthesis ​ speed float default: "1.0" Speech rate multiplier (higher values produce faster speech) ​ seed Optional[int] default: "None" Random seed for reproducible audio generation ​ Input The service accepts text input through the pipeline, including streaming text from an LLM service. ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate, based on the constructor setting ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ Methods See the TTS base class methods for additional functionality. ​ Language Support GroqTTSService supports the following languages: Language Code Description Service Codes Language.EN English en ​ Usage Example Copy Ask AI from pipecat.services.groq.tts import GroqTTSService from pipecat.transcriptions.language import Language # Configure service tts = GroqTTSService( api_key = "your-api-key" , model_name = "playai-tts" , voice_id = "Celeste-PlayAI" , params = GroqTTSService.InputParams( language = Language. EN , speed = 1.0 , seed = 42 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Processing duration ​ Audio Processing Streams audio in chunks Outputs mono audio at the defined sample rate Handles WAV header removal automatically Supports WAV format by default ​ Notes Requires a Groq Cloud API key Streams audio in chunks for efficient processing Automatically handles WAV headers in the response Provides metrics collection Supports configurable speech parameters Google LMNT On this page Overview Installation Configuration Constructor Parameters Input Parameters Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_groq_979307a2.txt b/tts_groq_979307a2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..05cc2181a60bb6186684a4398d02bc5681273f06
--- /dev/null
+++ b/tts_groq_979307a2.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/groq#param-speed
+Title: Groq - Pipecat
+==================================================
+
+Groq - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Groq Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview GroqTTSService converts text to speech using Groq’s TTS API. It supports real-time audio generation with multiple voices. ​ Installation To use GroqTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[groq]" You’ll also need to set up your Groq API key as an environment variable: GROQ_API_KEY . You can obtain a Groq Cloud API key by signing up at Groq . ​ Configuration ​ Constructor Parameters ​ api_key str required Your Groq API key ​ output_format str default: "wav" Audio output format ​ params InputParams default: "InputParams()" Configuration parameters for speech generation ​ model_name str default: "playai-tts" TTS model to use. See the Groq Cloud docs for available models . ​ voice_id str default: "Celeste-PlayAI" Voice identifier to use for synthesis ​ Input Parameters ​ language Language default: "Language.EN" Language for speech synthesis ​ speed float default: "1.0" Speech rate multiplier (higher values produce faster speech) ​ seed Optional[int] default: "None" Random seed for reproducible audio generation ​ Input The service accepts text input through the pipeline, including streaming text from an LLM service. ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate, based on the constructor setting ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ Methods See the TTS base class methods for additional functionality. ​ Language Support GroqTTSService supports the following languages: Language Code Description Service Codes Language.EN English en ​ Usage Example Copy Ask AI from pipecat.services.groq.tts import GroqTTSService from pipecat.transcriptions.language import Language # Configure service tts = GroqTTSService( api_key = "your-api-key" , model_name = "playai-tts" , voice_id = "Celeste-PlayAI" , params = GroqTTSService.InputParams( language = Language. EN , speed = 1.0 , seed = 42 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) Processing duration ​ Audio Processing Streams audio in chunks Outputs mono audio at the defined sample rate Handles WAV header removal automatically Supports WAV format by default ​ Notes Requires a Groq Cloud API key Streams audio in chunks for efficient processing Automatically handles WAV headers in the response Provides metrics collection Supports configurable speech parameters Google LMNT On this page Overview Installation Configuration Constructor Parameters Input Parameters Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_lmnt_2cbf1959.txt b/tts_lmnt_2cbf1959.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c8da4d2ba79d763d3c346ba30f00362be82f64c1
--- /dev/null
+++ b/tts_lmnt_2cbf1959.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/lmnt#installation
+Title: LMNT - Pipecat
+==================================================
+
+LMNT - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech LMNT Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview LmntTTSService provides text-to-speech capabilities using LMNT’s WebSocket-based streaming API. It supports real-time audio generation with multiple voices and languages. ​ Installation To use LmntTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[lmnt]" You’ll also need to set up your LMNT API key as an environment variable: LMNT_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str required Your LMNT API key ​ voice_id str required LMNT voice identifier ​ sample_rate int default: "None" Output audio sample rate in Hz ​ language Language default: "Language.EN" Synthesis language ​ model str default: "aurora" Model identifier ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: - PCM audio format (16-bit) - Specified sample rate - Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains LMNT API error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports multiple languages and their variants: Language Code Description Service Code Supported Models Language.DE German de Aurora, Blizzard Language.EN English en Aurora, Blizzard Language.ES Spanish es Aurora, Blizzard Language.FR French fr Aurora, Blizzard Language.HI Hindi hi Aurora, Blizzard Language.ID Indonesian id Blizzard Language.IT Italian it Blizzard Language.JA Japanese ja Blizzard Language.KO Korean ko Aurora, Blizzard Language.NL Dutch nl Blizzard Language.PL Polish pl Blizzard Language.PT Portuguese pt Aurora, Blizzard Language.RU Russian ru Blizzard Language.SV Swedish sv Blizzard Language.TH Thai th Blizzard Language.TR Turkish tr Blizzard Language.UK Ukrainian uk Blizzard Language.VI Vietnamese vi Blizzard Language.ZH Chinese zh Aurora, Blizzard ​ Usage Example Copy Ask AI from pipecat.services.lmnt.tts import LmntTTSService from pipecat.transcriptions.language import Language # Configure service tts = LmntTTSService( api_key = "your-lmnt-api-key" , voice_id = "your-voice-id" , sample_rate = 24000 , language = Language. EN , model = 'blizzard' ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage WebSocket connection status ​ Notes Supports streaming synthesis Handles WebSocket lifecycle Provides real-time audio chunks Manages connection state Supports interruptions Thread-safe processing Automatic error handling Includes metrics collection Groq MiniMax On this page Overview Installation Configuration Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_lmnt_a5c51610.txt b/tts_lmnt_a5c51610.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1397d3ebfcd0b65de6730a6f90eafde90938f05b
--- /dev/null
+++ b/tts_lmnt_a5c51610.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/lmnt#overview
+Title: LMNT - Pipecat
+==================================================
+
+LMNT - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech LMNT Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview LmntTTSService provides text-to-speech capabilities using LMNT’s WebSocket-based streaming API. It supports real-time audio generation with multiple voices and languages. ​ Installation To use LmntTTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[lmnt]" You’ll also need to set up your LMNT API key as an environment variable: LMNT_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str required Your LMNT API key ​ voice_id str required LMNT voice identifier ​ sample_rate int default: "None" Output audio sample rate in Hz ​ language Language default: "Language.EN" Synthesis language ​ model str default: "aurora" Model identifier ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: - PCM audio format (16-bit) - Specified sample rate - Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains LMNT API error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports multiple languages and their variants: Language Code Description Service Code Supported Models Language.DE German de Aurora, Blizzard Language.EN English en Aurora, Blizzard Language.ES Spanish es Aurora, Blizzard Language.FR French fr Aurora, Blizzard Language.HI Hindi hi Aurora, Blizzard Language.ID Indonesian id Blizzard Language.IT Italian it Blizzard Language.JA Japanese ja Blizzard Language.KO Korean ko Aurora, Blizzard Language.NL Dutch nl Blizzard Language.PL Polish pl Blizzard Language.PT Portuguese pt Aurora, Blizzard Language.RU Russian ru Blizzard Language.SV Swedish sv Blizzard Language.TH Thai th Blizzard Language.TR Turkish tr Blizzard Language.UK Ukrainian uk Blizzard Language.VI Vietnamese vi Blizzard Language.ZH Chinese zh Aurora, Blizzard ​ Usage Example Copy Ask AI from pipecat.services.lmnt.tts import LmntTTSService from pipecat.transcriptions.language import Language # Configure service tts = LmntTTSService( api_key = "your-lmnt-api-key" , voice_id = "your-voice-id" , sample_rate = 24000 , language = Language. EN , model = 'blizzard' ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage WebSocket connection status ​ Notes Supports streaming synthesis Handles WebSocket lifecycle Provides real-time audio chunks Manages connection state Supports interruptions Thread-safe processing Automatic error handling Includes metrics collection Groq MiniMax On this page Overview Installation Configuration Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_minimax_1986780c.txt b/tts_minimax_1986780c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..825a54453b0821dbf6a0beb7c696a8e5adb56f39
--- /dev/null
+++ b/tts_minimax_1986780c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/minimax#param-tts-started-frame
+Title: MiniMax - Pipecat
+==================================================
+
+MiniMax - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech MiniMax Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview MiniMaxHttpTTSService provides text-to-speech capabilities using MiniMax’s T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options. ​ Installation To use MiniMaxHttpTTSService , no additional dependencies are required. You’ll also need MiniMax API credentials (API key and Group ID). ​ Configuration ​ Constructor Parameters ​ api_key str required MiniMax API key for authentication ​ group_id str required MiniMax Group ID to identify your project ​ model str default: "speech-02-turbo" MiniMax TTS model to use. Available options include: speech-02-hd : HD model with superior rhythm and stability speech-02-turbo : Turbo model with enhanced multilingual capabilities speech-01-hd : Rich voices with expressive emotions speech-01-turbo : Low-latency model with regular updates ​ voice_id str default: "Calm_Woman" MiniMax voice identifier. Options include: Wise_Woman Friendly_Person Inspirational_girl Deep_Voice_Man Calm_Woman Casual_Guy Lively_Girl Patient_Man Young_Knight Determined_Man Lovely_Girl Decent_Boy Imposing_Manner Elegant_Man Abbess Sweet_Girl_2 Exuberant_Girl See the MiniMax documentation for a complete list of available voices. ​ aiohttp_session aiohttp.ClientSession required Aiohttp session for API communication ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams TTS configuration parameters ​ Input Parameters ​ language Language default: "Language.EN" Language for TTS generation ​ speed float default: "1.0" Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less than 1.0 decrease speed. ​ volume float default: "1.0" Speech volume (range: 0 to 10). Values greater than 1.0 increase volume. ​ pitch float default: "0" Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative values lower pitch. ​ emotion str Emotional tone of the speech. Options include: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, and “neutral”. ​ english_normalization bool Whether to apply English text normalization, which improves performance in number-reading scenarios at the cost of slightly increased latency. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains MiniMax API error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports a wide range of languages through the language_boost parameter: Language Code Service Code Description Language.AR Arabic Arabic Language.CS Czech Czech Language.DE German German Language.EL Greek Greek Language.EN English English Language.ES Spanish Spanish Language.FI Finnish Finnish Language.FR French French Language.HI Hindi Hindi Language.ID Indonesian Indonesian Language.IT Italian Italian Language.JA Japanese Japanese Language.KO Korean Korean Language.NL Dutch Dutch Language.PL Polish Polish Language.PT Portuguese Portuguese Language.RO Romanian Romanian Language.RU Russian Russian Language.TH Thai Thai Language.TR Turkish Turkish Language.UK Ukrainian Ukrainian Language.VI Vietnamese Vietnamese Language.YUE Chinese,Yue Chinese (Cantonese) Language.ZH Chinese Chinese (Mandarin) ​ Usage Example Copy Ask AI import aiohttp import os from pipecat.services.minimax.tts import MiniMaxHttpTTSService from pipecat.transcriptions.language import Language async def create_tts_service (): # Create an HTTP session session = aiohttp.ClientSession() # Configure service with credentials tts = MiniMaxHttpTTSService( api_key = os.getenv( "MINIMAX_API_KEY" ), group_id = os.getenv( "MINIMAX_GROUP_ID" ), model = "speech-02-turbo" , voice_id = "Patient_Man" , aiohttp_session = session, params = MiniMaxHttpTTSService.InputParams( language = Language. EN , speed = 1.1 , # Slightly faster speech volume = 1.2 , # Slightly louder pitch = 0 , # Default pitch emotion = "neutral" # Neutral emotional tone ) ) return tts # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage ​ Notes Uses streaming audio generation for faster initial response Processes audio in chunks for efficient memory usage Supports real-time applications with low latency Automatically handles API authentication Provides PCM audio compatible with most audio pipelines LMNT Neuphonic On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_minimax_2e4276b8.txt b/tts_minimax_2e4276b8.txt
new file mode 100644
index 0000000000000000000000000000000000000000..75144841880fb79701ddb44a0279762b3b8aecf5
--- /dev/null
+++ b/tts_minimax_2e4276b8.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/minimax#param-tts-audio-raw-frame
+Title: MiniMax - Pipecat
+==================================================
+
+MiniMax - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech MiniMax Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview MiniMaxHttpTTSService provides text-to-speech capabilities using MiniMax’s T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options. ​ Installation To use MiniMaxHttpTTSService , no additional dependencies are required. You’ll also need MiniMax API credentials (API key and Group ID). ​ Configuration ​ Constructor Parameters ​ api_key str required MiniMax API key for authentication ​ group_id str required MiniMax Group ID to identify your project ​ model str default: "speech-02-turbo" MiniMax TTS model to use. Available options include: speech-02-hd : HD model with superior rhythm and stability speech-02-turbo : Turbo model with enhanced multilingual capabilities speech-01-hd : Rich voices with expressive emotions speech-01-turbo : Low-latency model with regular updates ​ voice_id str default: "Calm_Woman" MiniMax voice identifier. Options include: Wise_Woman Friendly_Person Inspirational_girl Deep_Voice_Man Calm_Woman Casual_Guy Lively_Girl Patient_Man Young_Knight Determined_Man Lovely_Girl Decent_Boy Imposing_Manner Elegant_Man Abbess Sweet_Girl_2 Exuberant_Girl See the MiniMax documentation for a complete list of available voices. ​ aiohttp_session aiohttp.ClientSession required Aiohttp session for API communication ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams TTS configuration parameters ​ Input Parameters ​ language Language default: "Language.EN" Language for TTS generation ​ speed float default: "1.0" Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less than 1.0 decrease speed. ​ volume float default: "1.0" Speech volume (range: 0 to 10). Values greater than 1.0 increase volume. ​ pitch float default: "0" Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative values lower pitch. ​ emotion str Emotional tone of the speech. Options include: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, and “neutral”. ​ english_normalization bool Whether to apply English text normalization, which improves performance in number-reading scenarios at the cost of slightly increased latency. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains MiniMax API error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports a wide range of languages through the language_boost parameter: Language Code Service Code Description Language.AR Arabic Arabic Language.CS Czech Czech Language.DE German German Language.EL Greek Greek Language.EN English English Language.ES Spanish Spanish Language.FI Finnish Finnish Language.FR French French Language.HI Hindi Hindi Language.ID Indonesian Indonesian Language.IT Italian Italian Language.JA Japanese Japanese Language.KO Korean Korean Language.NL Dutch Dutch Language.PL Polish Polish Language.PT Portuguese Portuguese Language.RO Romanian Romanian Language.RU Russian Russian Language.TH Thai Thai Language.TR Turkish Turkish Language.UK Ukrainian Ukrainian Language.VI Vietnamese Vietnamese Language.YUE Chinese,Yue Chinese (Cantonese) Language.ZH Chinese Chinese (Mandarin) ​ Usage Example Copy Ask AI import aiohttp import os from pipecat.services.minimax.tts import MiniMaxHttpTTSService from pipecat.transcriptions.language import Language async def create_tts_service (): # Create an HTTP session session = aiohttp.ClientSession() # Configure service with credentials tts = MiniMaxHttpTTSService( api_key = os.getenv( "MINIMAX_API_KEY" ), group_id = os.getenv( "MINIMAX_GROUP_ID" ), model = "speech-02-turbo" , voice_id = "Patient_Man" , aiohttp_session = session, params = MiniMaxHttpTTSService.InputParams( language = Language. EN , speed = 1.1 , # Slightly faster speech volume = 1.2 , # Slightly louder pitch = 0 , # Default pitch emotion = "neutral" # Neutral emotional tone ) ) return tts # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage ​ Notes Uses streaming audio generation for faster initial response Processes audio in chunks for efficient memory usage Supports real-time applications with low latency Automatically handles API authentication Provides PCM audio compatible with most audio pipelines LMNT Neuphonic On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_minimax_467a7115.txt b/tts_minimax_467a7115.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f40fadfe588d0defff46e5c82ca6b2aeb3e9f510
--- /dev/null
+++ b/tts_minimax_467a7115.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/minimax#language-support
+Title: MiniMax - Pipecat
+==================================================
+
+MiniMax - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech MiniMax Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview MiniMaxHttpTTSService provides text-to-speech capabilities using MiniMax’s T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options. ​ Installation To use MiniMaxHttpTTSService , no additional dependencies are required. You’ll also need MiniMax API credentials (API key and Group ID). ​ Configuration ​ Constructor Parameters ​ api_key str required MiniMax API key for authentication ​ group_id str required MiniMax Group ID to identify your project ​ model str default: "speech-02-turbo" MiniMax TTS model to use. Available options include: speech-02-hd : HD model with superior rhythm and stability speech-02-turbo : Turbo model with enhanced multilingual capabilities speech-01-hd : Rich voices with expressive emotions speech-01-turbo : Low-latency model with regular updates ​ voice_id str default: "Calm_Woman" MiniMax voice identifier. Options include: Wise_Woman Friendly_Person Inspirational_girl Deep_Voice_Man Calm_Woman Casual_Guy Lively_Girl Patient_Man Young_Knight Determined_Man Lovely_Girl Decent_Boy Imposing_Manner Elegant_Man Abbess Sweet_Girl_2 Exuberant_Girl See the MiniMax documentation for a complete list of available voices. ​ aiohttp_session aiohttp.ClientSession required Aiohttp session for API communication ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams TTS configuration parameters ​ Input Parameters ​ language Language default: "Language.EN" Language for TTS generation ​ speed float default: "1.0" Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less than 1.0 decrease speed. ​ volume float default: "1.0" Speech volume (range: 0 to 10). Values greater than 1.0 increase volume. ​ pitch float default: "0" Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative values lower pitch. ​ emotion str Emotional tone of the speech. Options include: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, and “neutral”. ​ english_normalization bool Whether to apply English text normalization, which improves performance in number-reading scenarios at the cost of slightly increased latency. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains MiniMax API error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports a wide range of languages through the language_boost parameter: Language Code Service Code Description Language.AR Arabic Arabic Language.CS Czech Czech Language.DE German German Language.EL Greek Greek Language.EN English English Language.ES Spanish Spanish Language.FI Finnish Finnish Language.FR French French Language.HI Hindi Hindi Language.ID Indonesian Indonesian Language.IT Italian Italian Language.JA Japanese Japanese Language.KO Korean Korean Language.NL Dutch Dutch Language.PL Polish Polish Language.PT Portuguese Portuguese Language.RO Romanian Romanian Language.RU Russian Russian Language.TH Thai Thai Language.TR Turkish Turkish Language.UK Ukrainian Ukrainian Language.VI Vietnamese Vietnamese Language.YUE Chinese,Yue Chinese (Cantonese) Language.ZH Chinese Chinese (Mandarin) ​ Usage Example Copy Ask AI import aiohttp import os from pipecat.services.minimax.tts import MiniMaxHttpTTSService from pipecat.transcriptions.language import Language async def create_tts_service (): # Create an HTTP session session = aiohttp.ClientSession() # Configure service with credentials tts = MiniMaxHttpTTSService( api_key = os.getenv( "MINIMAX_API_KEY" ), group_id = os.getenv( "MINIMAX_GROUP_ID" ), model = "speech-02-turbo" , voice_id = "Patient_Man" , aiohttp_session = session, params = MiniMaxHttpTTSService.InputParams( language = Language. EN , speed = 1.1 , # Slightly faster speech volume = 1.2 , # Slightly louder pitch = 0 , # Default pitch emotion = "neutral" # Neutral emotional tone ) ) return tts # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage ​ Notes Uses streaming audio generation for faster initial response Processes audio in chunks for efficient memory usage Supports real-time applications with low latency Automatically handles API authentication Provides PCM audio compatible with most audio pipelines LMNT Neuphonic On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_minimax_4947f69a.txt b/tts_minimax_4947f69a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2e14aaab6f2572f3a95edd1226671dfb7f24c0a7
--- /dev/null
+++ b/tts_minimax_4947f69a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/minimax#installation
+Title: MiniMax - Pipecat
+==================================================
+
+MiniMax - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech MiniMax Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview MiniMaxHttpTTSService provides text-to-speech capabilities using MiniMax’s T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options. ​ Installation To use MiniMaxHttpTTSService , no additional dependencies are required. You’ll also need MiniMax API credentials (API key and Group ID). ​ Configuration ​ Constructor Parameters ​ api_key str required MiniMax API key for authentication ​ group_id str required MiniMax Group ID to identify your project ​ model str default: "speech-02-turbo" MiniMax TTS model to use. Available options include: speech-02-hd : HD model with superior rhythm and stability speech-02-turbo : Turbo model with enhanced multilingual capabilities speech-01-hd : Rich voices with expressive emotions speech-01-turbo : Low-latency model with regular updates ​ voice_id str default: "Calm_Woman" MiniMax voice identifier. Options include: Wise_Woman Friendly_Person Inspirational_girl Deep_Voice_Man Calm_Woman Casual_Guy Lively_Girl Patient_Man Young_Knight Determined_Man Lovely_Girl Decent_Boy Imposing_Manner Elegant_Man Abbess Sweet_Girl_2 Exuberant_Girl See the MiniMax documentation for a complete list of available voices. ​ aiohttp_session aiohttp.ClientSession required Aiohttp session for API communication ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams TTS configuration parameters ​ Input Parameters ​ language Language default: "Language.EN" Language for TTS generation ​ speed float default: "1.0" Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less than 1.0 decrease speed. ​ volume float default: "1.0" Speech volume (range: 0 to 10). Values greater than 1.0 increase volume. ​ pitch float default: "0" Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative values lower pitch. ​ emotion str Emotional tone of the speech. Options include: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, and “neutral”. ​ english_normalization bool Whether to apply English text normalization, which improves performance in number-reading scenarios at the cost of slightly increased latency. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains MiniMax API error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports a wide range of languages through the language_boost parameter: Language Code Service Code Description Language.AR Arabic Arabic Language.CS Czech Czech Language.DE German German Language.EL Greek Greek Language.EN English English Language.ES Spanish Spanish Language.FI Finnish Finnish Language.FR French French Language.HI Hindi Hindi Language.ID Indonesian Indonesian Language.IT Italian Italian Language.JA Japanese Japanese Language.KO Korean Korean Language.NL Dutch Dutch Language.PL Polish Polish Language.PT Portuguese Portuguese Language.RO Romanian Romanian Language.RU Russian Russian Language.TH Thai Thai Language.TR Turkish Turkish Language.UK Ukrainian Ukrainian Language.VI Vietnamese Vietnamese Language.YUE Chinese,Yue Chinese (Cantonese) Language.ZH Chinese Chinese (Mandarin) ​ Usage Example Copy Ask AI import aiohttp import os from pipecat.services.minimax.tts import MiniMaxHttpTTSService from pipecat.transcriptions.language import Language async def create_tts_service (): # Create an HTTP session session = aiohttp.ClientSession() # Configure service with credentials tts = MiniMaxHttpTTSService( api_key = os.getenv( "MINIMAX_API_KEY" ), group_id = os.getenv( "MINIMAX_GROUP_ID" ), model = "speech-02-turbo" , voice_id = "Patient_Man" , aiohttp_session = session, params = MiniMaxHttpTTSService.InputParams( language = Language. EN , speed = 1.1 , # Slightly faster speech volume = 1.2 , # Slightly louder pitch = 0 , # Default pitch emotion = "neutral" # Neutral emotional tone ) ) return tts # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage ​ Notes Uses streaming audio generation for faster initial response Processes audio in chunks for efficient memory usage Supports real-time applications with low latency Automatically handles API authentication Provides PCM audio compatible with most audio pipelines LMNT Neuphonic On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_minimax_ac61a3b9.txt b/tts_minimax_ac61a3b9.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9aa557306784021be658ab3e0764a64c6e24ab47
--- /dev/null
+++ b/tts_minimax_ac61a3b9.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/minimax#param-speed
+Title: MiniMax - Pipecat
+==================================================
+
+MiniMax - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech MiniMax Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview MiniMaxHttpTTSService provides text-to-speech capabilities using MiniMax’s T2A (Text-to-Audio) API. It supports multiple voices, emotions, languages, and speech customization options. ​ Installation To use MiniMaxHttpTTSService , no additional dependencies are required. You’ll also need MiniMax API credentials (API key and Group ID). ​ Configuration ​ Constructor Parameters ​ api_key str required MiniMax API key for authentication ​ group_id str required MiniMax Group ID to identify your project ​ model str default: "speech-02-turbo" MiniMax TTS model to use. Available options include: speech-02-hd : HD model with superior rhythm and stability speech-02-turbo : Turbo model with enhanced multilingual capabilities speech-01-hd : Rich voices with expressive emotions speech-01-turbo : Low-latency model with regular updates ​ voice_id str default: "Calm_Woman" MiniMax voice identifier. Options include: Wise_Woman Friendly_Person Inspirational_girl Deep_Voice_Man Calm_Woman Casual_Guy Lively_Girl Patient_Man Young_Knight Determined_Man Lovely_Girl Decent_Boy Imposing_Manner Elegant_Man Abbess Sweet_Girl_2 Exuberant_Girl See the MiniMax documentation for a complete list of available voices. ​ aiohttp_session aiohttp.ClientSession required Aiohttp session for API communication ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams TTS configuration parameters ​ Input Parameters ​ language Language default: "Language.EN" Language for TTS generation ​ speed float default: "1.0" Speech speed (range: 0.5 to 2.0). Values greater than 1.0 increase speed, less than 1.0 decrease speed. ​ volume float default: "1.0" Speech volume (range: 0 to 10). Values greater than 1.0 increase volume. ​ pitch float default: "0" Pitch adjustment (range: -12 to 12). Positive values raise pitch, negative values lower pitch. ​ emotion str Emotional tone of the speech. Options include: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, and “neutral”. ​ english_normalization bool Whether to apply English text normalization, which improves performance in number-reading scenarios at the cost of slightly increased latency. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: PCM audio format Sample rate as specified Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains MiniMax API error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports a wide range of languages through the language_boost parameter: Language Code Service Code Description Language.AR Arabic Arabic Language.CS Czech Czech Language.DE German German Language.EL Greek Greek Language.EN English English Language.ES Spanish Spanish Language.FI Finnish Finnish Language.FR French French Language.HI Hindi Hindi Language.ID Indonesian Indonesian Language.IT Italian Italian Language.JA Japanese Japanese Language.KO Korean Korean Language.NL Dutch Dutch Language.PL Polish Polish Language.PT Portuguese Portuguese Language.RO Romanian Romanian Language.RU Russian Russian Language.TH Thai Thai Language.TR Turkish Turkish Language.UK Ukrainian Ukrainian Language.VI Vietnamese Vietnamese Language.YUE Chinese,Yue Chinese (Cantonese) Language.ZH Chinese Chinese (Mandarin) ​ Usage Example Copy Ask AI import aiohttp import os from pipecat.services.minimax.tts import MiniMaxHttpTTSService from pipecat.transcriptions.language import Language async def create_tts_service (): # Create an HTTP session session = aiohttp.ClientSession() # Configure service with credentials tts = MiniMaxHttpTTSService( api_key = os.getenv( "MINIMAX_API_KEY" ), group_id = os.getenv( "MINIMAX_GROUP_ID" ), model = "speech-02-turbo" , voice_id = "Patient_Man" , aiohttp_session = session, params = MiniMaxHttpTTSService.InputParams( language = Language. EN , speed = 1.1 , # Slightly faster speech volume = 1.2 , # Slightly louder pitch = 0 , # Default pitch emotion = "neutral" # Neutral emotional tone ) ) return tts # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage ​ Notes Uses streaming audio generation for faster initial response Processes audio in chunks for efficient memory usage Supports real-time applications with low latency Automatically handles API authentication Provides PCM audio compatible with most audio pipelines LMNT Neuphonic On this page Overview Installation Configuration Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_openai_10dcd51e.txt b/tts_openai_10dcd51e.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2070bb31af3063cd291201d313cdb27e13c9d6cd
--- /dev/null
+++ b/tts_openai_10dcd51e.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/openai#frame-flow
+Title: OpenAI - Pipecat
+==================================================
+
+OpenAI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech OpenAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenAITTSService converts text to speech using OpenAI’s TTS API. It supports multiple voices and provides high-quality audio output at 24kHz using both traditional TTS models and the gpt-4o TTS models. ​ Installation To use OpenAITTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str | None OpenAI API key ​ voice str default: "alloy" Voice identifier. Options: "alloy" "echo" "fable" "onyx" "nova" "shimmer" ​ model str default: "gpt-4o-mini-tts" Model to use. Options: "gpt-4o-mini-tts" "tts-1" "tts-1-hd" ​ sample_rate int default: "None" Output audio sample rate in Hz. Supports only 24000 Hz. ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of audio generation ​ TTSStoppedFrame Frame Signals completion of audio generation ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM encoded audio 24kHz sample rate Mono channel ​ Error Frames ​ ErrorFrame Frame Contains error information if TTS fails ​ Methods See the TTS base class methods for additional functionality. ​ Models Model Description Best For gpt-4o-mini-tts Latest GPT-based TTS model Faster generation, improved prosody, recommended for most use cases tts-1 Original TTS model Standard quality speech tts-1-hd High-definition TTS model Premium quality speech with higher fidelity ​ Language Support OpenAI TTS supports the following languages and regional variants: Language Code Description Service Codes Language.EN English en ​ Usage Example Copy Ask AI from pipecat.services.openai.tts import OpenAITTSService # Configure service tts = OpenAITTSService( voice = "nova" , model = "gpt-4o-mini-tts" , ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration ​ Notes Outputs PCM audio at 24kHz Streams audio in 1KB chunks Supports multiple voices Uses GPT-4o Mini TTS by default for improved quality Includes metrics collection Thread-safe processing Handles empty text gracefully NVIDIA Riva Piper On this page Overview Installation Configuration Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Methods Models Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_openai_17ee9c48.txt b/tts_openai_17ee9c48.txt
new file mode 100644
index 0000000000000000000000000000000000000000..24beda0c557b2dfbef4a2c6138801c01d6fd90f7
--- /dev/null
+++ b/tts_openai_17ee9c48.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/openai#param-text-filter
+Title: OpenAI - Pipecat
+==================================================
+
+OpenAI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech OpenAI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview OpenAITTSService converts text to speech using OpenAI’s TTS API. It supports multiple voices and provides high-quality audio output at 24kHz using both traditional TTS models and the gpt-4o TTS models. ​ Installation To use OpenAITTSService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[openai]" You’ll also need to set up your OpenAI API key as an environment variable: OPENAI_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str | None OpenAI API key ​ voice str default: "alloy" Voice identifier. Options: "alloy" "echo" "fable" "onyx" "nova" "shimmer" ​ model str default: "gpt-4o-mini-tts" Model to use. Options: "gpt-4o-mini-tts" "tts-1" "tts-1-hd" ​ sample_rate int default: "None" Output audio sample rate in Hz. Supports only 24000 Hz. ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of audio generation ​ TTSStoppedFrame Frame Signals completion of audio generation ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM encoded audio 24kHz sample rate Mono channel ​ Error Frames ​ ErrorFrame Frame Contains error information if TTS fails ​ Methods See the TTS base class methods for additional functionality. ​ Models Model Description Best For gpt-4o-mini-tts Latest GPT-based TTS model Faster generation, improved prosody, recommended for most use cases tts-1 Original TTS model Standard quality speech tts-1-hd High-definition TTS model Premium quality speech with higher fidelity ​ Language Support OpenAI TTS supports the following languages and regional variants: Language Code Description Service Codes Language.EN English en ​ Usage Example Copy Ask AI from pipecat.services.openai.tts import OpenAITTSService # Configure service tts = OpenAITTSService( voice = "nova" , model = "gpt-4o-mini-tts" , ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration ​ Notes Outputs PCM audio at 24kHz Streams audio in 1KB chunks Supports multiple voices Uses GPT-4o Mini TTS by default for improved quality Includes metrics collection Thread-safe processing Handles empty text gracefully NVIDIA Riva Piper On this page Overview Installation Configuration Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Methods Models Language Support Usage Example Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_piper_6a9e12c5.txt b/tts_piper_6a9e12c5.txt
new file mode 100644
index 0000000000000000000000000000000000000000..43ef55af3d714733bd3b00b230f4a0ca0f83ff4b
--- /dev/null
+++ b/tts_piper_6a9e12c5.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/piper#param-sample-rate-1
+Title: Piper - Pipecat
+==================================================
+
+Piper - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Piper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview PiperTTSService converts text to speech using the Piper TTS server. This service provides integration with a locally-running Piper TTS service, offering self-hosted speech synthesis capabilities. ​ Installation To use PiperTTSService , no additional dependencies in Pipecat are required. You’ll also need to set up a running Piper TTS server following the Piper HTTP server documentation . ​ Configuration ​ Constructor Parameters ​ base_url str required API base URL for the Piper TTS server (without a trailing slash) ​ aiohttp_session aiohttp.ClientSession required aiohttp ClientSession for making HTTP requests ​ sample_rate Optional[int] default: "None" Output sample rate in Hz. When None, the sample rate depends on the voice model being used by the Piper server. ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Input The service accepts text input through its TTS pipeline. ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw audio data chunk ​ sample_rate int Audio sample rate (depends on the Piper model) ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ ErrorFrame Signals that an error occurred during audio generation: ​ error str Error message ​ Methods See the TTS base class methods for additional functionality. ​ Usage Example Copy Ask AI import aiohttp from pipecat.services.piper.tts import PiperTTSService # Create aiohttp session session = aiohttp.ClientSession() # Configure service tts = PiperTTSService( base_url = "http://localhost:5000/api/tts" , aiohttp_session = session, sample_rate = 22050 # Optional: specify if you know the model's sample rate ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration ​ Audio Processing Streams audio in 1KB chunks Automatically handles WAV headers in the response Outputs mono audio Supports the sample rate specified by your Piper voice model ​ Notes Requires a running Piper TTS server Self-hosted solution with no external API dependencies Streams audio in chunks for efficient processing Automatically handles WAV headers in the response Provides metrics collection OpenAI PlayHT On this page Overview Installation Configuration Constructor Parameters Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame ErrorFrame Methods Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_playht_53cba06b.txt b/tts_playht_53cba06b.txt
new file mode 100644
index 0000000000000000000000000000000000000000..64e94008955340007ecfb4f56561bea01dcfe25a
--- /dev/null
+++ b/tts_playht_53cba06b.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/playht#notes
+Title: PlayHT - Pipecat
+==================================================
+
+PlayHT - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech PlayHT Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview PlayHT provides two TTS service implementations: PlayHTTTSService : WebSocket-based service with real-time streaming PlayHTHttpTTSService : HTTP-based service for simpler, non-streaming synthesis ​ Installation To use PlayHT services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[playht]" You’ll also need to set up your PlayHT credentials as environment variables: PLAY_HT_USER_ID PLAY_HT_API_KEY ​ PlayHTTTSService WebSocket-based implementation supporting real-time streaming synthesis. ​ Constructor Parameters ​ api_key str required PlayHT API key ​ user_id str required PlayHT user ID ​ voice_url str required Voice identifier URL ​ voice_engine str default: "PlayHT3.0-mini" TTS engine identifier. See the PlayHT docs for available engines. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ output_format str default: "wav" Audio output format ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Input Parameters Copy Ask AI class InputParams ( BaseModel ): language: Optional[Language] = Language. EN speed: Optional[ float ] = 1.0 seed: Optional[ int ] = None ​ PlayHTHttpTTSService HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required PlayHT API key ​ user_id str required PlayHT user ID ​ voice_url str required Voice identifier URL ​ voice_engine str default: "Play3.0-mini-http" TTS engine identifier. The PlayHTHttpTTSService supports either Play3.0-mini-http or Play3.0-mini-ws . ​ sample_rate int default: "None" Output audio sample rate in Hz ​ Input Parameters Copy Ask AI class InputParams ( BaseModel ): language: Optional[Language] = Language. EN speed: Optional[ float ] = 1.0 seed: Optional[ int ] = None ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of synthesis ​ TTSStoppedFrame Frame Signals completion of synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: - WAV format - Specified sample rate - Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains PlayHT error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports multiple languages when using the PlayHT3.0-mini engine: Language Code Description Service Code Language.AF Afrikaans afrikans Language.AM Amharic amharic Language.AR Arabic arabic Language.BN Bengali bengali Language.BG Bulgarian bulgarian Language.CA Catalan catalan Language.CS Czech czech Language.DA Danish danish Language.DE German german Language.EL Greek greek Language.EN English english Language.ES Spanish spanish Language.FR French french Language.GL Galician galician Language.HE Hebrew hebrew Language.HI Hindi hindi Language.HR Croatian croatian Language.HU Hungarian hungarian Language.ID Indonesian indonesian Language.IT Italian italian Language.JA Japanese japanese Language.KO Korean korean Language.MS Malay malay Language.NL Dutch dutch Language.PL Polish polish Language.PT Portuguese portuguese Language.RU Russian russian Language.SQ Albanian albanian Language.SR Serbian serbian Language.SV Swedish swedish Language.TH Thai thai Language.TL Tagalog tagalog Language.TR Turkish turkish Language.UK Ukrainian ukrainian Language.UR Urdu urdu Language.XH Xhosa xhosa Language.ZH Mandarin mandarin See the PlayHT docs for a complete list of languages and options. ​ Usage Examples ​ WebSocket Service Copy Ask AI # Configure WebSocket service ws_service = PlayHTTTSService( api_key = "your-api-key" , user_id = "your-user-id" , voice_url = "voice-url" , voice_engine = "PlayHT3.0-mini" , params = PlayHTTTSService.InputParams( language = Language. EN , speed = 1.2 ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI # Configure HTTP service http_service = PlayHTHttpTTSService( api_key = "your-api-key" , user_id = "your-user-id" , voice_url = "voice-url" , voice_engine = "PlayHT3.0-mini" , params = PlayHTHttpTTSService.InputParams( language = Language. EN , speed = 1.0 ) ) ​ Frame Flow ​ WebSocket Service ​ HTTP Service ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Processing duration Character usage API calls ​ Notes ​ WebSocket Service Real-time streaming support Automatic reconnection Interruption handling WAV header management Thread-safe processing ​ HTTP Service Simpler implementation Complete audio delivery WAV header parsing Chunked audio delivery Lower latency for short texts ​ Common Features Multiple voice engines Speed control Language support Seed-based consistency Error handling Metrics collection Piper Rime On this page Overview Installation PlayHTTTSService Constructor Parameters Input Parameters PlayHTHttpTTSService Constructor Parameters Input Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Examples WebSocket Service HTTP Service Frame Flow WebSocket Service HTTP Service Metrics Support Notes WebSocket Service HTTP Service Common Features Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_2890f3fd.txt b/tts_rime_2890f3fd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..239122ec2729a79e1483b4afbd081dbcf822582a
--- /dev/null
+++ b/tts_rime_2890f3fd.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/rime#param-language
+Title: Rime - Pipecat
+==================================================
+
+Rime - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Rime Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Rime AI provides two TTS service implementations: RimeTTSService : WebSocket-based implementation with word-level timing and interruption support RimeHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use Rime services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[rime]" You can obtain a Rime API key by signing up at Rime . ​ Choosing a Rime service Rime has two supported services: RimeTTSService which is a websocket-based implementation RimeHttpTTSService , which is an HTTP-based implementation ​ RimeTTSService The RimeTTSService is recommended for real-time interactive applications. It offers: Word-level timing information for precise synchronization Support for interruptions and context management Context tracking across multiple messages within a turn Non-blocking operation that allows other frames to be processed while audio is being generated ​ RimeHttpTTSService The RimeHttpTTSService is simpler and more suitable for non-interactive use cases. It: Processes the entire text in one request Supports advanced text control features (pauses, phonemes, inline speed) Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated ​ Input Parameters Both services use the same base input parameters structure: ​ language Language default: "Language.EN" The language to use for synthesis. See Language Support section for available options. ​ speed_alpha float default: "1.0" Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed. ​ reduce_latency bool default: "false" Trade accuracy for lower latency ​ pause_between_brackets bool default: "false" When set to true , adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: "Hi. <200> I'd love to have a conversation with you." adds a 200ms pause between the first and second sentences. ​ phonemize_between_brackets bool default: "false" When set to true, you can specify the phonemes for a word enclosed in curly brackets. Example: "{h'El.o} World" will pronounce "Hello" as expected. See Rime’s docs for more details. ​ RimeTTSService (WebSocket) Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier ​ url str default: "wss://users.rime.ai/ws2" Rime WebSocket API endpoint ​ model str default: "mistv2" Model ID to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters (see Input Parameters section above) ​ text_aggregator BaseTextAggregator default: "SkipTagsAggregator" Text aggregator for processing input text. Defaults to skipping content between spell( and ) tags. ​ Features Word-level timing information with cumulative timing across messages Support for interruptions with context clearing Context tracking across multiple messages within a turn Real-time audio streaming Proper sentence aggregation with skip tags support ​ RimeHttpTTSService (HTTP) HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier. See Rime’s documentation for supported voices. ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ model str default: "mistv2" Choose mistv2 for hyper-realistic conversational voices, mist for Rime’s previous generation model, or the latest arcana model. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters Show additional properties ​ inline_speed_alpha str default: "None" Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: "This sentence is [really] [fast]" with inline_speed_alpha set to "0.5, 3" will make “really” slow and “fast” fast. ​ Output Frames Both services generate the following frames: ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM audio format, specified sample rate, single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Rime TTS error information ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German ger Language.EN English eng Language.ES Spanish spa Language.FR French fra ​ Usage Examples ​ WebSocket Service Copy Ask AI from pipecat.services.rime.tts import RimeTTSService from pipecat.transcriptions.language import Language # Configure WebSocket service ws_tts = RimeTTSService( api_key = "your-rime-api-key" , voice_id = "cove" , model = "mistv2" , params = RimeTTSService.InputParams( language = Language. EN , speed_alpha = 1.0 , reduce_latency = False ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI import aiohttp from pipecat.services.rime.tts import RimeHttpTTSService # Configure HTTP service async with aiohttp.ClientSession() as session: http_tts = RimeHttpTTSService( api_key = "your-rime-api-key" , voice_id = "eva" , aiohttp_session = session, model = "mistv2" , params = RimeHttpTTSService.InputParams( speed_alpha = 1.2 , reduce_latency = True , pause_between_brackets = True , inline_speed_alpha = "0.8,1.5" ) ) ​ Frame Flow ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Character usage statistics ​ Service Comparison Feature WebSocket HTTP Word timing ✓ - Interruption support ✓ - Context tracking ✓ - Bracket-based pauses ✓ ✓ Phoneme control ✓ ✓ Inline speed control - ✓ Streaming audio ✓ ✓ Arcana model support - ✓ PlayHT Sarvam AI On this page Overview Installation Choosing a Rime service RimeTTSService RimeHttpTTSService Input Parameters RimeTTSService (WebSocket) Constructor Parameters Features RimeHttpTTSService (HTTP) Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Examples WebSocket Service HTTP Service Frame Flow Metrics Support Service Comparison Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_4715aac6.txt b/tts_rime_4715aac6.txt
new file mode 100644
index 0000000000000000000000000000000000000000..1df533a62bb78a11f3de95c5c2e834710afeff04
--- /dev/null
+++ b/tts_rime_4715aac6.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/rime#error-frames
+Title: Rime - Pipecat
+==================================================
+
+Rime - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Rime Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Rime AI provides two TTS service implementations: RimeTTSService : WebSocket-based implementation with word-level timing and interruption support RimeHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use Rime services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[rime]" You can obtain a Rime API key by signing up at Rime . ​ Choosing a Rime service Rime has two supported services: RimeTTSService which is a websocket-based implementation RimeHttpTTSService , which is an HTTP-based implementation ​ RimeTTSService The RimeTTSService is recommended for real-time interactive applications. It offers: Word-level timing information for precise synchronization Support for interruptions and context management Context tracking across multiple messages within a turn Non-blocking operation that allows other frames to be processed while audio is being generated ​ RimeHttpTTSService The RimeHttpTTSService is simpler and more suitable for non-interactive use cases. It: Processes the entire text in one request Supports advanced text control features (pauses, phonemes, inline speed) Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated ​ Input Parameters Both services use the same base input parameters structure: ​ language Language default: "Language.EN" The language to use for synthesis. See Language Support section for available options. ​ speed_alpha float default: "1.0" Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed. ​ reduce_latency bool default: "false" Trade accuracy for lower latency ​ pause_between_brackets bool default: "false" When set to true , adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: "Hi. <200> I'd love to have a conversation with you." adds a 200ms pause between the first and second sentences. ​ phonemize_between_brackets bool default: "false" When set to true, you can specify the phonemes for a word enclosed in curly brackets. Example: "{h'El.o} World" will pronounce "Hello" as expected. See Rime’s docs for more details. ​ RimeTTSService (WebSocket) Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier ​ url str default: "wss://users.rime.ai/ws2" Rime WebSocket API endpoint ​ model str default: "mistv2" Model ID to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters (see Input Parameters section above) ​ text_aggregator BaseTextAggregator default: "SkipTagsAggregator" Text aggregator for processing input text. Defaults to skipping content between spell( and ) tags. ​ Features Word-level timing information with cumulative timing across messages Support for interruptions with context clearing Context tracking across multiple messages within a turn Real-time audio streaming Proper sentence aggregation with skip tags support ​ RimeHttpTTSService (HTTP) HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier. See Rime’s documentation for supported voices. ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ model str default: "mistv2" Choose mistv2 for hyper-realistic conversational voices, mist for Rime’s previous generation model, or the latest arcana model. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters Show additional properties ​ inline_speed_alpha str default: "None" Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: "This sentence is [really] [fast]" with inline_speed_alpha set to "0.5, 3" will make “really” slow and “fast” fast. ​ Output Frames Both services generate the following frames: ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM audio format, specified sample rate, single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Rime TTS error information ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German ger Language.EN English eng Language.ES Spanish spa Language.FR French fra ​ Usage Examples ​ WebSocket Service Copy Ask AI from pipecat.services.rime.tts import RimeTTSService from pipecat.transcriptions.language import Language # Configure WebSocket service ws_tts = RimeTTSService( api_key = "your-rime-api-key" , voice_id = "cove" , model = "mistv2" , params = RimeTTSService.InputParams( language = Language. EN , speed_alpha = 1.0 , reduce_latency = False ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI import aiohttp from pipecat.services.rime.tts import RimeHttpTTSService # Configure HTTP service async with aiohttp.ClientSession() as session: http_tts = RimeHttpTTSService( api_key = "your-rime-api-key" , voice_id = "eva" , aiohttp_session = session, model = "mistv2" , params = RimeHttpTTSService.InputParams( speed_alpha = 1.2 , reduce_latency = True , pause_between_brackets = True , inline_speed_alpha = "0.8,1.5" ) ) ​ Frame Flow ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Character usage statistics ​ Service Comparison Feature WebSocket HTTP Word timing ✓ - Interruption support ✓ - Context tracking ✓ - Bracket-based pauses ✓ ✓ Phoneme control ✓ ✓ Inline speed control - ✓ Streaming audio ✓ ✓ Arcana model support - ✓ PlayHT Sarvam AI On this page Overview Installation Choosing a Rime service RimeTTSService RimeHttpTTSService Input Parameters RimeTTSService (WebSocket) Constructor Parameters Features RimeHttpTTSService (HTTP) Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Examples WebSocket Service HTTP Service Frame Flow Metrics Support Service Comparison Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_488978b7.txt b/tts_rime_488978b7.txt
new file mode 100644
index 0000000000000000000000000000000000000000..b6154e67dd0b551f5927987b4ffe15c4a2e57223
--- /dev/null
+++ b/tts_rime_488978b7.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/_sites/docs.pipecat.ai/server/services/tts/rime#real-time-processing
+Title: Overview - Pipecat
+==================================================
+
+Overview - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. “Multimodal” means you can use any combination of audio, video, images, and/or text in your interactions. And “real-time” means that things are happening quickly enough that it feels conversational—a “back-and-forth” with a bot, not submitting a query and waiting for results. ​ What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions ​ How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. ​ Real-time Processing Pipecat’s pipeline architecture handles both simple voice interactions and complex multimodal processing. Let’s look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether you’re building a simple voice assistant or a complex multimodal application. Pipecat’s pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. ​ Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns ​ Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_a6fefbcd.txt b/tts_rime_a6fefbcd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..60151c3e8077161cabf3cb8c219fe9f36169b4ca
--- /dev/null
+++ b/tts_rime_a6fefbcd.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/rime#input-parameters
+Title: Rime - Pipecat
+==================================================
+
+Rime - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Rime Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Rime AI provides two TTS service implementations: RimeTTSService : WebSocket-based implementation with word-level timing and interruption support RimeHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use Rime services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[rime]" You can obtain a Rime API key by signing up at Rime . ​ Choosing a Rime service Rime has two supported services: RimeTTSService which is a websocket-based implementation RimeHttpTTSService , which is an HTTP-based implementation ​ RimeTTSService The RimeTTSService is recommended for real-time interactive applications. It offers: Word-level timing information for precise synchronization Support for interruptions and context management Context tracking across multiple messages within a turn Non-blocking operation that allows other frames to be processed while audio is being generated ​ RimeHttpTTSService The RimeHttpTTSService is simpler and more suitable for non-interactive use cases. It: Processes the entire text in one request Supports advanced text control features (pauses, phonemes, inline speed) Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated ​ Input Parameters Both services use the same base input parameters structure: ​ language Language default: "Language.EN" The language to use for synthesis. See Language Support section for available options. ​ speed_alpha float default: "1.0" Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed. ​ reduce_latency bool default: "false" Trade accuracy for lower latency ​ pause_between_brackets bool default: "false" When set to true , adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: "Hi. <200> I'd love to have a conversation with you." adds a 200ms pause between the first and second sentences. ​ phonemize_between_brackets bool default: "false" When set to true, you can specify the phonemes for a word enclosed in curly brackets. Example: "{h'El.o} World" will pronounce "Hello" as expected. See Rime’s docs for more details. ​ RimeTTSService (WebSocket) Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier ​ url str default: "wss://users.rime.ai/ws2" Rime WebSocket API endpoint ​ model str default: "mistv2" Model ID to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters (see Input Parameters section above) ​ text_aggregator BaseTextAggregator default: "SkipTagsAggregator" Text aggregator for processing input text. Defaults to skipping content between spell( and ) tags. ​ Features Word-level timing information with cumulative timing across messages Support for interruptions with context clearing Context tracking across multiple messages within a turn Real-time audio streaming Proper sentence aggregation with skip tags support ​ RimeHttpTTSService (HTTP) HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier. See Rime’s documentation for supported voices. ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ model str default: "mistv2" Choose mistv2 for hyper-realistic conversational voices, mist for Rime’s previous generation model, or the latest arcana model. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters Show additional properties ​ inline_speed_alpha str default: "None" Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: "This sentence is [really] [fast]" with inline_speed_alpha set to "0.5, 3" will make “really” slow and “fast” fast. ​ Output Frames Both services generate the following frames: ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM audio format, specified sample rate, single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Rime TTS error information ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German ger Language.EN English eng Language.ES Spanish spa Language.FR French fra ​ Usage Examples ​ WebSocket Service Copy Ask AI from pipecat.services.rime.tts import RimeTTSService from pipecat.transcriptions.language import Language # Configure WebSocket service ws_tts = RimeTTSService( api_key = "your-rime-api-key" , voice_id = "cove" , model = "mistv2" , params = RimeTTSService.InputParams( language = Language. EN , speed_alpha = 1.0 , reduce_latency = False ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI import aiohttp from pipecat.services.rime.tts import RimeHttpTTSService # Configure HTTP service async with aiohttp.ClientSession() as session: http_tts = RimeHttpTTSService( api_key = "your-rime-api-key" , voice_id = "eva" , aiohttp_session = session, model = "mistv2" , params = RimeHttpTTSService.InputParams( speed_alpha = 1.2 , reduce_latency = True , pause_between_brackets = True , inline_speed_alpha = "0.8,1.5" ) ) ​ Frame Flow ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Character usage statistics ​ Service Comparison Feature WebSocket HTTP Word timing ✓ - Interruption support ✓ - Context tracking ✓ - Bracket-based pauses ✓ ✓ Phoneme control ✓ ✓ Inline speed control - ✓ Streaming audio ✓ ✓ Arcana model support - ✓ PlayHT Sarvam AI On this page Overview Installation Choosing a Rime service RimeTTSService RimeHttpTTSService Input Parameters RimeTTSService (WebSocket) Constructor Parameters Features RimeHttpTTSService (HTTP) Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Examples WebSocket Service HTTP Service Frame Flow Metrics Support Service Comparison Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_c9490a1f.txt b/tts_rime_c9490a1f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..67a1bc5f9db3833c42efda097422790f4e426866
--- /dev/null
+++ b/tts_rime_c9490a1f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/rime#constructor-parameters
+Title: Rime - Pipecat
+==================================================
+
+Rime - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Rime Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Rime AI provides two TTS service implementations: RimeTTSService : WebSocket-based implementation with word-level timing and interruption support RimeHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use Rime services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[rime]" You can obtain a Rime API key by signing up at Rime . ​ Choosing a Rime service Rime has two supported services: RimeTTSService which is a websocket-based implementation RimeHttpTTSService , which is an HTTP-based implementation ​ RimeTTSService The RimeTTSService is recommended for real-time interactive applications. It offers: Word-level timing information for precise synchronization Support for interruptions and context management Context tracking across multiple messages within a turn Non-blocking operation that allows other frames to be processed while audio is being generated ​ RimeHttpTTSService The RimeHttpTTSService is simpler and more suitable for non-interactive use cases. It: Processes the entire text in one request Supports advanced text control features (pauses, phonemes, inline speed) Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated ​ Input Parameters Both services use the same base input parameters structure: ​ language Language default: "Language.EN" The language to use for synthesis. See Language Support section for available options. ​ speed_alpha float default: "1.0" Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed. ​ reduce_latency bool default: "false" Trade accuracy for lower latency ​ pause_between_brackets bool default: "false" When set to true , adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: "Hi. <200> I'd love to have a conversation with you." adds a 200ms pause between the first and second sentences. ​ phonemize_between_brackets bool default: "false" When set to true, you can specify the phonemes for a word enclosed in curly brackets. Example: "{h'El.o} World" will pronounce "Hello" as expected. See Rime’s docs for more details. ​ RimeTTSService (WebSocket) Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier ​ url str default: "wss://users.rime.ai/ws2" Rime WebSocket API endpoint ​ model str default: "mistv2" Model ID to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters (see Input Parameters section above) ​ text_aggregator BaseTextAggregator default: "SkipTagsAggregator" Text aggregator for processing input text. Defaults to skipping content between spell( and ) tags. ​ Features Word-level timing information with cumulative timing across messages Support for interruptions with context clearing Context tracking across multiple messages within a turn Real-time audio streaming Proper sentence aggregation with skip tags support ​ RimeHttpTTSService (HTTP) HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier. See Rime’s documentation for supported voices. ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ model str default: "mistv2" Choose mistv2 for hyper-realistic conversational voices, mist for Rime’s previous generation model, or the latest arcana model. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters Show additional properties ​ inline_speed_alpha str default: "None" Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: "This sentence is [really] [fast]" with inline_speed_alpha set to "0.5, 3" will make “really” slow and “fast” fast. ​ Output Frames Both services generate the following frames: ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM audio format, specified sample rate, single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Rime TTS error information ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German ger Language.EN English eng Language.ES Spanish spa Language.FR French fra ​ Usage Examples ​ WebSocket Service Copy Ask AI from pipecat.services.rime.tts import RimeTTSService from pipecat.transcriptions.language import Language # Configure WebSocket service ws_tts = RimeTTSService( api_key = "your-rime-api-key" , voice_id = "cove" , model = "mistv2" , params = RimeTTSService.InputParams( language = Language. EN , speed_alpha = 1.0 , reduce_latency = False ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI import aiohttp from pipecat.services.rime.tts import RimeHttpTTSService # Configure HTTP service async with aiohttp.ClientSession() as session: http_tts = RimeHttpTTSService( api_key = "your-rime-api-key" , voice_id = "eva" , aiohttp_session = session, model = "mistv2" , params = RimeHttpTTSService.InputParams( speed_alpha = 1.2 , reduce_latency = True , pause_between_brackets = True , inline_speed_alpha = "0.8,1.5" ) ) ​ Frame Flow ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Character usage statistics ​ Service Comparison Feature WebSocket HTTP Word timing ✓ - Interruption support ✓ - Context tracking ✓ - Bracket-based pauses ✓ ✓ Phoneme control ✓ ✓ Inline speed control - ✓ Streaming audio ✓ ✓ Arcana model support - ✓ PlayHT Sarvam AI On this page Overview Installation Choosing a Rime service RimeTTSService RimeHttpTTSService Input Parameters RimeTTSService (WebSocket) Constructor Parameters Features RimeHttpTTSService (HTTP) Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Examples WebSocket Service HTTP Service Frame Flow Metrics Support Service Comparison Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_df4a2498.txt b/tts_rime_df4a2498.txt
new file mode 100644
index 0000000000000000000000000000000000000000..53ed8d8732bc6248f81fef760ccd5c9cacc2814c
--- /dev/null
+++ b/tts_rime_df4a2498.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/rime#param-params
+Title: Rime - Pipecat
+==================================================
+
+Rime - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Rime Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Rime AI provides two TTS service implementations: RimeTTSService : WebSocket-based implementation with word-level timing and interruption support RimeHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use Rime services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[rime]" You can obtain a Rime API key by signing up at Rime . ​ Choosing a Rime service Rime has two supported services: RimeTTSService which is a websocket-based implementation RimeHttpTTSService , which is an HTTP-based implementation ​ RimeTTSService The RimeTTSService is recommended for real-time interactive applications. It offers: Word-level timing information for precise synchronization Support for interruptions and context management Context tracking across multiple messages within a turn Non-blocking operation that allows other frames to be processed while audio is being generated ​ RimeHttpTTSService The RimeHttpTTSService is simpler and more suitable for non-interactive use cases. It: Processes the entire text in one request Supports advanced text control features (pauses, phonemes, inline speed) Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated ​ Input Parameters Both services use the same base input parameters structure: ​ language Language default: "Language.EN" The language to use for synthesis. See Language Support section for available options. ​ speed_alpha float default: "1.0" Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed. ​ reduce_latency bool default: "false" Trade accuracy for lower latency ​ pause_between_brackets bool default: "false" When set to true , adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: "Hi. <200> I'd love to have a conversation with you." adds a 200ms pause between the first and second sentences. ​ phonemize_between_brackets bool default: "false" When set to true, you can specify the phonemes for a word enclosed in curly brackets. Example: "{h'El.o} World" will pronounce "Hello" as expected. See Rime’s docs for more details. ​ RimeTTSService (WebSocket) Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier ​ url str default: "wss://users.rime.ai/ws2" Rime WebSocket API endpoint ​ model str default: "mistv2" Model ID to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters (see Input Parameters section above) ​ text_aggregator BaseTextAggregator default: "SkipTagsAggregator" Text aggregator for processing input text. Defaults to skipping content between spell( and ) tags. ​ Features Word-level timing information with cumulative timing across messages Support for interruptions with context clearing Context tracking across multiple messages within a turn Real-time audio streaming Proper sentence aggregation with skip tags support ​ RimeHttpTTSService (HTTP) HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier. See Rime’s documentation for supported voices. ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ model str default: "mistv2" Choose mistv2 for hyper-realistic conversational voices, mist for Rime’s previous generation model, or the latest arcana model. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters Show additional properties ​ inline_speed_alpha str default: "None" Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: "This sentence is [really] [fast]" with inline_speed_alpha set to "0.5, 3" will make “really” slow and “fast” fast. ​ Output Frames Both services generate the following frames: ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM audio format, specified sample rate, single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Rime TTS error information ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German ger Language.EN English eng Language.ES Spanish spa Language.FR French fra ​ Usage Examples ​ WebSocket Service Copy Ask AI from pipecat.services.rime.tts import RimeTTSService from pipecat.transcriptions.language import Language # Configure WebSocket service ws_tts = RimeTTSService( api_key = "your-rime-api-key" , voice_id = "cove" , model = "mistv2" , params = RimeTTSService.InputParams( language = Language. EN , speed_alpha = 1.0 , reduce_latency = False ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI import aiohttp from pipecat.services.rime.tts import RimeHttpTTSService # Configure HTTP service async with aiohttp.ClientSession() as session: http_tts = RimeHttpTTSService( api_key = "your-rime-api-key" , voice_id = "eva" , aiohttp_session = session, model = "mistv2" , params = RimeHttpTTSService.InputParams( speed_alpha = 1.2 , reduce_latency = True , pause_between_brackets = True , inline_speed_alpha = "0.8,1.5" ) ) ​ Frame Flow ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Character usage statistics ​ Service Comparison Feature WebSocket HTTP Word timing ✓ - Interruption support ✓ - Context tracking ✓ - Bracket-based pauses ✓ ✓ Phoneme control ✓ ✓ Inline speed control - ✓ Streaming audio ✓ ✓ Arcana model support - ✓ PlayHT Sarvam AI On this page Overview Installation Choosing a Rime service RimeTTSService RimeHttpTTSService Input Parameters RimeTTSService (WebSocket) Constructor Parameters Features RimeHttpTTSService (HTTP) Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Examples WebSocket Service HTTP Service Frame Flow Metrics Support Service Comparison Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_e67c5f63.txt b/tts_rime_e67c5f63.txt
new file mode 100644
index 0000000000000000000000000000000000000000..690b52bd7cbc2103a1d5db8106925d377b491743
--- /dev/null
+++ b/tts_rime_e67c5f63.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/rime#param-params-1
+Title: Rime - Pipecat
+==================================================
+
+Rime - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Rime Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Rime AI provides two TTS service implementations: RimeTTSService : WebSocket-based implementation with word-level timing and interruption support RimeHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use Rime services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[rime]" You can obtain a Rime API key by signing up at Rime . ​ Choosing a Rime service Rime has two supported services: RimeTTSService which is a websocket-based implementation RimeHttpTTSService , which is an HTTP-based implementation ​ RimeTTSService The RimeTTSService is recommended for real-time interactive applications. It offers: Word-level timing information for precise synchronization Support for interruptions and context management Context tracking across multiple messages within a turn Non-blocking operation that allows other frames to be processed while audio is being generated ​ RimeHttpTTSService The RimeHttpTTSService is simpler and more suitable for non-interactive use cases. It: Processes the entire text in one request Supports advanced text control features (pauses, phonemes, inline speed) Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated ​ Input Parameters Both services use the same base input parameters structure: ​ language Language default: "Language.EN" The language to use for synthesis. See Language Support section for available options. ​ speed_alpha float default: "1.0" Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed. ​ reduce_latency bool default: "false" Trade accuracy for lower latency ​ pause_between_brackets bool default: "false" When set to true , adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: "Hi. <200> I'd love to have a conversation with you." adds a 200ms pause between the first and second sentences. ​ phonemize_between_brackets bool default: "false" When set to true, you can specify the phonemes for a word enclosed in curly brackets. Example: "{h'El.o} World" will pronounce "Hello" as expected. See Rime’s docs for more details. ​ RimeTTSService (WebSocket) Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier ​ url str default: "wss://users.rime.ai/ws2" Rime WebSocket API endpoint ​ model str default: "mistv2" Model ID to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters (see Input Parameters section above) ​ text_aggregator BaseTextAggregator default: "SkipTagsAggregator" Text aggregator for processing input text. Defaults to skipping content between spell( and ) tags. ​ Features Word-level timing information with cumulative timing across messages Support for interruptions with context clearing Context tracking across multiple messages within a turn Real-time audio streaming Proper sentence aggregation with skip tags support ​ RimeHttpTTSService (HTTP) HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier. See Rime’s documentation for supported voices. ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ model str default: "mistv2" Choose mistv2 for hyper-realistic conversational voices, mist for Rime’s previous generation model, or the latest arcana model. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters Show additional properties ​ inline_speed_alpha str default: "None" Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: "This sentence is [really] [fast]" with inline_speed_alpha set to "0.5, 3" will make “really” slow and “fast” fast. ​ Output Frames Both services generate the following frames: ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM audio format, specified sample rate, single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Rime TTS error information ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German ger Language.EN English eng Language.ES Spanish spa Language.FR French fra ​ Usage Examples ​ WebSocket Service Copy Ask AI from pipecat.services.rime.tts import RimeTTSService from pipecat.transcriptions.language import Language # Configure WebSocket service ws_tts = RimeTTSService( api_key = "your-rime-api-key" , voice_id = "cove" , model = "mistv2" , params = RimeTTSService.InputParams( language = Language. EN , speed_alpha = 1.0 , reduce_latency = False ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI import aiohttp from pipecat.services.rime.tts import RimeHttpTTSService # Configure HTTP service async with aiohttp.ClientSession() as session: http_tts = RimeHttpTTSService( api_key = "your-rime-api-key" , voice_id = "eva" , aiohttp_session = session, model = "mistv2" , params = RimeHttpTTSService.InputParams( speed_alpha = 1.2 , reduce_latency = True , pause_between_brackets = True , inline_speed_alpha = "0.8,1.5" ) ) ​ Frame Flow ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Character usage statistics ​ Service Comparison Feature WebSocket HTTP Word timing ✓ - Interruption support ✓ - Context tracking ✓ - Bracket-based pauses ✓ ✓ Phoneme control ✓ ✓ Inline speed control - ✓ Streaming audio ✓ ✓ Arcana model support - ✓ PlayHT Sarvam AI On this page Overview Installation Choosing a Rime service RimeTTSService RimeHttpTTSService Input Parameters RimeTTSService (WebSocket) Constructor Parameters Features RimeHttpTTSService (HTTP) Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Examples WebSocket Service HTTP Service Frame Flow Metrics Support Service Comparison Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_f83d3e1f.txt b/tts_rime_f83d3e1f.txt
new file mode 100644
index 0000000000000000000000000000000000000000..40121e267b3383c61ef67f52edcfd776658a6643
--- /dev/null
+++ b/tts_rime_f83d3e1f.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/rime#param-model
+Title: Rime - Pipecat
+==================================================
+
+Rime - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Rime Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Rime AI provides two TTS service implementations: RimeTTSService : WebSocket-based implementation with word-level timing and interruption support RimeHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use Rime services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[rime]" You can obtain a Rime API key by signing up at Rime . ​ Choosing a Rime service Rime has two supported services: RimeTTSService which is a websocket-based implementation RimeHttpTTSService , which is an HTTP-based implementation ​ RimeTTSService The RimeTTSService is recommended for real-time interactive applications. It offers: Word-level timing information for precise synchronization Support for interruptions and context management Context tracking across multiple messages within a turn Non-blocking operation that allows other frames to be processed while audio is being generated ​ RimeHttpTTSService The RimeHttpTTSService is simpler and more suitable for non-interactive use cases. It: Processes the entire text in one request Supports advanced text control features (pauses, phonemes, inline speed) Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated ​ Input Parameters Both services use the same base input parameters structure: ​ language Language default: "Language.EN" The language to use for synthesis. See Language Support section for available options. ​ speed_alpha float default: "1.0" Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed. ​ reduce_latency bool default: "false" Trade accuracy for lower latency ​ pause_between_brackets bool default: "false" When set to true , adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: "Hi. <200> I'd love to have a conversation with you." adds a 200ms pause between the first and second sentences. ​ phonemize_between_brackets bool default: "false" When set to true, you can specify the phonemes for a word enclosed in curly brackets. Example: "{h'El.o} World" will pronounce "Hello" as expected. See Rime’s docs for more details. ​ RimeTTSService (WebSocket) Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier ​ url str default: "wss://users.rime.ai/ws2" Rime WebSocket API endpoint ​ model str default: "mistv2" Model ID to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters (see Input Parameters section above) ​ text_aggregator BaseTextAggregator default: "SkipTagsAggregator" Text aggregator for processing input text. Defaults to skipping content between spell( and ) tags. ​ Features Word-level timing information with cumulative timing across messages Support for interruptions with context clearing Context tracking across multiple messages within a turn Real-time audio streaming Proper sentence aggregation with skip tags support ​ RimeHttpTTSService (HTTP) HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier. See Rime’s documentation for supported voices. ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ model str default: "mistv2" Choose mistv2 for hyper-realistic conversational voices, mist for Rime’s previous generation model, or the latest arcana model. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters Show additional properties ​ inline_speed_alpha str default: "None" Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: "This sentence is [really] [fast]" with inline_speed_alpha set to "0.5, 3" will make “really” slow and “fast” fast. ​ Output Frames Both services generate the following frames: ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM audio format, specified sample rate, single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Rime TTS error information ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German ger Language.EN English eng Language.ES Spanish spa Language.FR French fra ​ Usage Examples ​ WebSocket Service Copy Ask AI from pipecat.services.rime.tts import RimeTTSService from pipecat.transcriptions.language import Language # Configure WebSocket service ws_tts = RimeTTSService( api_key = "your-rime-api-key" , voice_id = "cove" , model = "mistv2" , params = RimeTTSService.InputParams( language = Language. EN , speed_alpha = 1.0 , reduce_latency = False ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI import aiohttp from pipecat.services.rime.tts import RimeHttpTTSService # Configure HTTP service async with aiohttp.ClientSession() as session: http_tts = RimeHttpTTSService( api_key = "your-rime-api-key" , voice_id = "eva" , aiohttp_session = session, model = "mistv2" , params = RimeHttpTTSService.InputParams( speed_alpha = 1.2 , reduce_latency = True , pause_between_brackets = True , inline_speed_alpha = "0.8,1.5" ) ) ​ Frame Flow ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Character usage statistics ​ Service Comparison Feature WebSocket HTTP Word timing ✓ - Interruption support ✓ - Context tracking ✓ - Bracket-based pauses ✓ ✓ Phoneme control ✓ ✓ Inline speed control - ✓ Streaming audio ✓ ✓ Arcana model support - ✓ PlayHT Sarvam AI On this page Overview Installation Choosing a Rime service RimeTTSService RimeHttpTTSService Input Parameters RimeTTSService (WebSocket) Constructor Parameters Features RimeHttpTTSService (HTTP) Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Examples WebSocket Service HTTP Service Frame Flow Metrics Support Service Comparison Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_rime_f8a8dc8b.txt b/tts_rime_f8a8dc8b.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e736c9f71eb0b68850a07caac242663fa9516cc9
--- /dev/null
+++ b/tts_rime_f8a8dc8b.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/rime#language-support
+Title: Rime - Pipecat
+==================================================
+
+Rime - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Rime Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Rime AI provides two TTS service implementations: RimeTTSService : WebSocket-based implementation with word-level timing and interruption support RimeHttpTTSService : HTTP-based implementation for simpler use cases ​ Installation To use Rime services, install the required dependencies: Copy Ask AI pip install "pipecat-ai[rime]" You can obtain a Rime API key by signing up at Rime . ​ Choosing a Rime service Rime has two supported services: RimeTTSService which is a websocket-based implementation RimeHttpTTSService , which is an HTTP-based implementation ​ RimeTTSService The RimeTTSService is recommended for real-time interactive applications. It offers: Word-level timing information for precise synchronization Support for interruptions and context management Context tracking across multiple messages within a turn Non-blocking operation that allows other frames to be processed while audio is being generated ​ RimeHttpTTSService The RimeHttpTTSService is simpler and more suitable for non-interactive use cases. It: Processes the entire text in one request Supports advanced text control features (pauses, phonemes, inline speed) Blocks during the HTTP request, preventing other frames from being processed until the audio is fully generated ​ Input Parameters Both services use the same base input parameters structure: ​ language Language default: "Language.EN" The language to use for synthesis. See Language Support section for available options. ​ speed_alpha float default: "1.0" Speech rate multiplier. Values less than 1.0 increase speed, values greater than 1.0 decrease speed. ​ reduce_latency bool default: "false" Trade accuracy for lower latency ​ pause_between_brackets bool default: "false" When set to true , adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds. Example: "Hi. <200> I'd love to have a conversation with you." adds a 200ms pause between the first and second sentences. ​ phonemize_between_brackets bool default: "false" When set to true, you can specify the phonemes for a word enclosed in curly brackets. Example: "{h'El.o} World" will pronounce "Hello" as expected. See Rime’s docs for more details. ​ RimeTTSService (WebSocket) Uses Rime’s WebSocket JSON API for real-time speech synthesis with word-level timing information. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier ​ url str default: "wss://users.rime.ai/ws2" Rime WebSocket API endpoint ​ model str default: "mistv2" Model ID to use for synthesis ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters (see Input Parameters section above) ​ text_aggregator BaseTextAggregator default: "SkipTagsAggregator" Text aggregator for processing input text. Defaults to skipping content between spell( and ) tags. ​ Features Word-level timing information with cumulative timing across messages Support for interruptions with context clearing Context tracking across multiple messages within a turn Real-time audio streaming Proper sentence aggregation with skip tags support ​ RimeHttpTTSService (HTTP) HTTP-based implementation for simpler synthesis requirements. ​ Constructor Parameters ​ api_key str required Rime API key ​ voice_id str required Rime voice identifier. See Rime’s documentation for supported voices. ​ aiohttp_session aiohttp.ClientSession required HTTP session for making requests ​ model str default: "mistv2" Choose mistv2 for hyper-realistic conversational voices, mist for Rime’s previous generation model, or the latest arcana model. ​ sample_rate int default: "None" Output audio sample rate in Hz ​ params InputParams default: "InputParams()" Speech generation parameters Show additional properties ​ inline_speed_alpha str default: "None" Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down. Example: "This sentence is [really] [fast]" with inline_speed_alpha set to "0.5, 3" will make “really” slow and “fast” fast. ​ Output Frames Both services generate the following frames: ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data: PCM audio format, specified sample rate, single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains Rime TTS error information ​ Language Support Supports multiple languages through the Language enum: Language Code Description Service Code Language.DE German ger Language.EN English eng Language.ES Spanish spa Language.FR French fra ​ Usage Examples ​ WebSocket Service Copy Ask AI from pipecat.services.rime.tts import RimeTTSService from pipecat.transcriptions.language import Language # Configure WebSocket service ws_tts = RimeTTSService( api_key = "your-rime-api-key" , voice_id = "cove" , model = "mistv2" , params = RimeTTSService.InputParams( language = Language. EN , speed_alpha = 1.0 , reduce_latency = False ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ HTTP Service Copy Ask AI import aiohttp from pipecat.services.rime.tts import RimeHttpTTSService # Configure HTTP service async with aiohttp.ClientSession() as session: http_tts = RimeHttpTTSService( api_key = "your-rime-api-key" , voice_id = "eva" , aiohttp_session = session, model = "mistv2" , params = RimeHttpTTSService.InputParams( speed_alpha = 1.2 , reduce_latency = True , pause_between_brackets = True , inline_speed_alpha = "0.8,1.5" ) ) ​ Frame Flow ​ Metrics Support Both services collect processing metrics: Time to First Byte (TTFB) Character usage statistics ​ Service Comparison Feature WebSocket HTTP Word timing ✓ - Interruption support ✓ - Context tracking ✓ - Bracket-based pauses ✓ ✓ Phoneme control ✓ ✓ Inline speed control - ✓ Streaming audio ✓ ✓ Arcana model support - ✓ PlayHT Sarvam AI On this page Overview Installation Choosing a Rime service RimeTTSService RimeHttpTTSService Input Parameters RimeTTSService (WebSocket) Constructor Parameters Features RimeHttpTTSService (HTTP) Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Language Support Usage Examples WebSocket Service HTTP Service Frame Flow Metrics Support Service Comparison Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_sarvam_1ccb2ac8.txt b/tts_sarvam_1ccb2ac8.txt
new file mode 100644
index 0000000000000000000000000000000000000000..26f896ffa289cfed3c495b2d6cebe2fb1835861d
--- /dev/null
+++ b/tts_sarvam_1ccb2ac8.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/sarvam
+Title: Sarvam AI - Pipecat
+==================================================
+
+Sarvam AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Sarvam AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SarvamTTSService converts text to speech using Sarvam AI’s TTS API. It specializes in Indian languages and provides extensive voice customization options including pitch, pace, and loudness control. ​ Installation To use SarvamTTSService , no additional dependencies are required. You’ll also need to set up your Sarvam AI API key as an environment variable: SARVAM_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str required Your Sarvam AI API subscription key ​ voice_id str default: "anushka" Speaker voice identifier (e.g., “anushka”, “meera”, “abhilash”) ​ model str default: "bulbul:v2" TTS model to use (“bulbul:v1” or “bulbul:v2”) ​ aiohttp_session aiohttp.ClientSession required Shared aiohttp session for making HTTP requests ​ base_url str default: "https://api.sarvam.ai" Sarvam AI API base URL ​ sample_rate int default: "None" Audio sample rate in Hz (8000, 16000, 22050, 24000) ​ params InputParams default: "None" Additional voice and preprocessing parameters ​ InputParams Configuration ​ language Language default: "Language.HI" Target language for synthesis ​ pitch float default: "0.0" Voice pitch adjustment (-0.75 to 0.75) ​ pace float default: "1.0" Speech speed (0.3 to 3.0) ​ loudness float default: "1.0" Audio volume (0.1 to 3.0) ​ enable_preprocessing bool default: "False" Enable text normalization for mixed-language content ​ Input The service accepts text input through its TTS pipeline with automatic WAV header stripping for clean PCM output. ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw PCM audio data (WAV header stripped) ​ sample_rate int Audio sample rate (22050Hz default) ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Sarvam AI TTS supports the following Indian languages: Language Code Description Service Code Language.BN Bengali bn-IN Language.EN English (India) en-IN Language.GU Gujarati gu-IN Language.HI Hindi hi-IN Language.KN Kannada kn-IN Language.ML Malayalam ml-IN Language.MR Marathi mr-IN Language.OR Odia od-IN Language.PA Punjabi pa-IN Language.TA Tamil ta-IN Language.TE Telugu te-IN ​ Voice Models See the Sarvam docs for the latest information on available voices and models. ​ Usage Example Copy Ask AI from pipecat.services.sarvam.tts import SarvamTTSService from pipecat.transcriptions.language import Language import aiohttp # Configure service async with aiohttp.ClientSession() as session: tts = SarvamTTSService( api_key = "your-api-key" , voice_id = "anushka" , model = "bulbul:v2" , aiohttp_session = session, params = SarvamTTSService.InputParams( language = Language. HI , ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration ​ Audio Processing Returns base64-encoded WAV audio from API Supports multiple sample rates (8000, 16000, 22050, 24000 Hz) Generates mono audio output Handles HTTP-based synthesis ​ Notes Requires valid Sarvam AI API subscription key Specializes in Indian languages and voices Uses HTTP POST requests for synthesis Thread-safe HTTP session management required Rime XTTS On this page Overview Installation Configuration Constructor Parameters InputParams Configuration Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Voice Models Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_sarvam_75633edd.txt b/tts_sarvam_75633edd.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c73372156e20f2ccd667a740e3171c8466371146
--- /dev/null
+++ b/tts_sarvam_75633edd.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/sarvam#ttsaudiorawframe
+Title: Sarvam AI - Pipecat
+==================================================
+
+Sarvam AI - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech Sarvam AI Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SarvamTTSService converts text to speech using Sarvam AI’s TTS API. It specializes in Indian languages and provides extensive voice customization options including pitch, pace, and loudness control. ​ Installation To use SarvamTTSService , no additional dependencies are required. You’ll also need to set up your Sarvam AI API key as an environment variable: SARVAM_API_KEY ​ Configuration ​ Constructor Parameters ​ api_key str required Your Sarvam AI API subscription key ​ voice_id str default: "anushka" Speaker voice identifier (e.g., “anushka”, “meera”, “abhilash”) ​ model str default: "bulbul:v2" TTS model to use (“bulbul:v1” or “bulbul:v2”) ​ aiohttp_session aiohttp.ClientSession required Shared aiohttp session for making HTTP requests ​ base_url str default: "https://api.sarvam.ai" Sarvam AI API base URL ​ sample_rate int default: "None" Audio sample rate in Hz (8000, 16000, 22050, 24000) ​ params InputParams default: "None" Additional voice and preprocessing parameters ​ InputParams Configuration ​ language Language default: "Language.HI" Target language for synthesis ​ pitch float default: "0.0" Voice pitch adjustment (-0.75 to 0.75) ​ pace float default: "1.0" Speech speed (0.3 to 3.0) ​ loudness float default: "1.0" Audio volume (0.1 to 3.0) ​ enable_preprocessing bool default: "False" Enable text normalization for mixed-language content ​ Input The service accepts text input through its TTS pipeline with automatic WAV header stripping for clean PCM output. ​ Output Frames ​ TTSStartedFrame Signals the start of audio generation. ​ TTSAudioRawFrame Contains generated audio data: ​ audio bytes Raw PCM audio data (WAV header stripped) ​ sample_rate int Audio sample rate (22050Hz default) ​ num_channels int Number of audio channels (1 for mono) ​ TTSStoppedFrame Signals the completion of audio generation. ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Sarvam AI TTS supports the following Indian languages: Language Code Description Service Code Language.BN Bengali bn-IN Language.EN English (India) en-IN Language.GU Gujarati gu-IN Language.HI Hindi hi-IN Language.KN Kannada kn-IN Language.ML Malayalam ml-IN Language.MR Marathi mr-IN Language.OR Odia od-IN Language.PA Punjabi pa-IN Language.TA Tamil ta-IN Language.TE Telugu te-IN ​ Voice Models See the Sarvam docs for the latest information on available voices and models. ​ Usage Example Copy Ask AI from pipecat.services.sarvam.tts import SarvamTTSService from pipecat.transcriptions.language import Language import aiohttp # Configure service async with aiohttp.ClientSession() as session: tts = SarvamTTSService( api_key = "your-api-key" , voice_id = "anushka" , model = "bulbul:v2" , aiohttp_session = session, params = SarvamTTSService.InputParams( language = Language. HI , ) ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Frame Flow ​ Metrics Support The service supports metrics collection: Time to First Byte (TTFB) TTS usage metrics Processing duration ​ Audio Processing Returns base64-encoded WAV audio from API Supports multiple sample rates (8000, 16000, 22050, 24000 Hz) Generates mono audio output Handles HTTP-based synthesis ​ Notes Requires valid Sarvam AI API subscription key Specializes in Indian languages and voices Uses HTTP POST requests for synthesis Thread-safe HTTP session management required Rime XTTS On this page Overview Installation Configuration Constructor Parameters InputParams Configuration Input Output Frames TTSStartedFrame TTSAudioRawFrame TTSStoppedFrame Methods Language Support Voice Models Usage Example Frame Flow Metrics Support Audio Processing Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_xtts_2fbe5e48.txt b/tts_xtts_2fbe5e48.txt
new file mode 100644
index 0000000000000000000000000000000000000000..39f9b9311bfe4e15df3ef509dc38a0e75927763b
--- /dev/null
+++ b/tts_xtts_2fbe5e48.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/xtts#metrics-support
+Title: XTTS - Pipecat
+==================================================
+
+XTTS - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech XTTS Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Coqui, the XTTS maintainer, has shut down. XTTS may not receive future updates or support. ​ Overview XTTSService provides text-to-speech capabilities using Coqui’s XTTS (Cross-lingual Text-to-Speech) model through a streaming server. It supports multiple languages and custom voice cloning. ​ Installation The service requires a running XTTS streaming server. You can start one using Docker: Copy Ask AI docker run --gpus=all -e COQUI_TOS_AGREED= 1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121 For more information, visit the official repository . ​ Configuration ​ Constructor Parameters ​ voice_id str required Voice identifier from studio speakers ​ language Language required Language for speech synthesis ​ base_url str required XTTS streaming server URL ​ aiohttp_session aiohttp.ClientSession required HTTP client session for API requests ​ sample_rate int default: "None" Output audio sample rate in Hz ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: - PCM audio format - Specified sample rate (resampled from 24kHz) - Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains XTTS server error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports multiple languages: Language Code Description Service Code Language.CS Czech cs Language.DE German de Language.EN English en Language.ES Spanish es Language.FR French fr Language.HI Hindi hi Language.HU Hungarian hu Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.NL Dutch nl Language.PL Polish pl Language.PT Portuguese pt Language.RU Russian ru Language.TR Turkish tr Language.ZH Chinese (Simplified) zh-cn ​ Usage Example Copy Ask AI from pipecat.services.xtts.tts import XTTSService from pipecat.transcriptions.language import Language import aiohttp # Configure service async with aiohttp.ClientSession() as session: tts = XTTSService( voice_id = "speaker_1" , language = Language. EN , base_url = "http://localhost:8000" , aiohttp_session = session ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Streaming Process The service handles audio streaming in chunks: Receives audio chunks from XTTS server Buffers chunks for processing Resamples audio to desired sample rate Delivers audio frames in real-time Copy Ask AI # Streaming configuration payload = { "text" : text, "language" : language_code, "speaker_embedding" : embeddings, "gpt_cond_latent" : latent_data, "add_wav_header" : False , "stream_chunk_size" : 20 } ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage Streaming performance ​ Notes Requires GPU for optimal performance Supports real-time streaming Automatic audio resampling Buffer management for smooth playback Thread-safe processing Automatic error handling Manages server connection lifecycle Text preprocessing (removes periods and asterisks) Sarvam AI AWS Nova Sonic On this page Overview Installation Configuration Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Streaming Process Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/tts_xtts_699f68fb.txt b/tts_xtts_699f68fb.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f75603466aaa80a7e0484cbd6c00b69e12022b9b
--- /dev/null
+++ b/tts_xtts_699f68fb.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/tts/xtts#output-frames
+Title: XTTS - Pipecat
+==================================================
+
+XTTS - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Text-to-Speech XTTS Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech AWS Polly Azure Cartesia Deepgram ElevenLabs Fish Audio Google Groq LMNT MiniMax Neuphonic NVIDIA Riva OpenAI Piper PlayHT Rime Sarvam AI XTTS Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Coqui, the XTTS maintainer, has shut down. XTTS may not receive future updates or support. ​ Overview XTTSService provides text-to-speech capabilities using Coqui’s XTTS (Cross-lingual Text-to-Speech) model through a streaming server. It supports multiple languages and custom voice cloning. ​ Installation The service requires a running XTTS streaming server. You can start one using Docker: Copy Ask AI docker run --gpus=all -e COQUI_TOS_AGREED= 1 --rm -p 8000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121 For more information, visit the official repository . ​ Configuration ​ Constructor Parameters ​ voice_id str required Voice identifier from studio speakers ​ language Language required Language for speech synthesis ​ base_url str required XTTS streaming server URL ​ aiohttp_session aiohttp.ClientSession required HTTP client session for API requests ​ sample_rate int default: "None" Output audio sample rate in Hz ​ text_filter BaseTextFilter default: "None" Modifies text provided to the TTS. Learn more about the available filters. ​ Output Frames ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals completion of speech synthesis ​ Audio Frames ​ TTSAudioRawFrame Frame Contains generated audio data with: - PCM audio format - Specified sample rate (resampled from 24kHz) - Single channel (mono) ​ Error Frames ​ ErrorFrame Frame Contains XTTS server error information ​ Methods See the TTS base class methods for additional functionality. ​ Language Support Supports multiple languages: Language Code Description Service Code Language.CS Czech cs Language.DE German de Language.EN English en Language.ES Spanish es Language.FR French fr Language.HI Hindi hi Language.HU Hungarian hu Language.IT Italian it Language.JA Japanese ja Language.KO Korean ko Language.NL Dutch nl Language.PL Polish pl Language.PT Portuguese pt Language.RU Russian ru Language.TR Turkish tr Language.ZH Chinese (Simplified) zh-cn ​ Usage Example Copy Ask AI from pipecat.services.xtts.tts import XTTSService from pipecat.transcriptions.language import Language import aiohttp # Configure service async with aiohttp.ClientSession() as session: tts = XTTSService( voice_id = "speaker_1" , language = Language. EN , base_url = "http://localhost:8000" , aiohttp_session = session ) # Use in pipeline pipeline = Pipeline([ ... , llm, tts, transport.output(), ]) ​ Streaming Process The service handles audio streaming in chunks: Receives audio chunks from XTTS server Buffers chunks for processing Resamples audio to desired sample rate Delivers audio frames in real-time Copy Ask AI # Streaming configuration payload = { "text" : text, "language" : language_code, "speaker_embedding" : embeddings, "gpt_cond_latent" : latent_data, "add_wav_header" : False , "stream_chunk_size" : 20 } ​ Frame Flow ​ Metrics Support The service collects processing metrics: Time to First Byte (TTFB) Processing duration Character usage Streaming performance ​ Notes Requires GPU for optimal performance Supports real-time streaming Automatic audio resampling Buffer management for smooth playback Thread-safe processing Automatic error handling Manages server connection lifecycle Text preprocessing (removes periods and asterisks) Sarvam AI AWS Nova Sonic On this page Overview Installation Configuration Constructor Parameters Output Frames Control Frames Audio Frames Error Frames Methods Language Support Usage Example Streaming Process Frame Flow Metrics Support Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_dtmf-aggregator_5df2e075.txt b/utilities_dtmf-aggregator_5df2e075.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7649b671236f1f213610ddf56534a190ddc3a2f9
--- /dev/null
+++ b/utilities_dtmf-aggregator_5df2e075.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/dtmf-aggregator#param-input-dtmf-frame
+Title: DTMFAggregator - Pipecat
+==================================================
+
+DTMFAggregator - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Telephony DTMFAggregator Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony DTMFAggregator Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DTMFAggregator processes incoming DTMF (Dual-Tone Multi-Frequency) frames from phone keypad input and aggregates them into complete sequences that can be understood by LLM services. It buffers individual digit presses and flushes them as transcription frames when a termination digit is pressed, a timeout occurs, or an interruption happens. This aggregator is essential for telephony applications where users interact via phone keypad buttons, converting raw DTMF input into structured text that LLMs can process alongside voice transcriptions. ​ Constructor Copy Ask AI aggregator = DTMFAggregator( timeout = 2.0 , termination_digit = KeypadEntry. POUND , prefix = "DTMF: " ) ​ timeout float default: "2.0" Idle timeout in seconds before flushing the aggregated digits ​ termination_digit KeypadEntry default: "KeypadEntry.POUND" Digit that triggers immediate flush of the aggregation ​ prefix str default: "DTMF: " Prefix added to DTMF sequence in the output transcription ​ Input Frames ​ InputDTMFFrame Frame Contains a single keypad button press with a KeypadEntry value ​ StartInterruptionFrame Frame Flushes any pending aggregation when user interruption begins ​ EndFrame Frame Flushes pending aggregation and stops the aggregation task ​ Output Frames ​ TranscriptionFrame Frame Contains the aggregated DTMF sequence as text with the configured prefix All input frames are passed through downstream, including the original InputDTMFFrame instances. ​ Keypad Entries The aggregator processes these standard phone keypad entries: KeypadEntry Value Description ZERO through NINE "0" - "9" Numeric digits STAR "*" Star/asterisk key POUND "#" Pound/hash key ​ Aggregation Behavior The aggregator flushes (emits a TranscriptionFrame) when: Termination digit : The configured termination digit is pressed (default: # ) Timeout : No new digits received within the timeout period (default: 2 seconds) Interruption : A StartInterruptionFrame is received Pipeline end : An EndFrame is received ​ Usage Examples ​ Basic Telephony Integration Copy Ask AI from pipecat.processors.aggregators.dtmf_aggregator import DTMFAggregator from pipecat.serializers.twilio import TwilioFrameSerializer # Create DTMF aggregator with default settings dtmf_aggregator = DTMFAggregator() # Set up Twilio serializer for phone integration serializer = TwilioFrameSerializer( stream_sid = stream_sid, call_sid = call_sid, account_sid = os.getenv( "TWILIO_ACCOUNT_SID" ), auth_token = os.getenv( "TWILIO_AUTH_TOKEN" ) ) # Create pipeline with DTMF processing pipeline = Pipeline([ transport.input(), # Websocket input from Twilio dtmf_aggregator, # Process DTMF before STT stt, # Speech-to-text service context_aggregator.user(), llm, # LLM processes both voice and DTMF tts, # Text-to-speech transport.output(), context_aggregator.assistant(), ]) ​ Custom Configuration for Menu Systems Copy Ask AI # Configure for menu system with star termination menu_dtmf = DTMFAggregator( timeout = 5.0 , # Longer timeout for menu selection termination_digit = KeypadEntry. STAR , # Use * to confirm selection prefix = "Menu selection: " # Clear prefix for LLM ) # Update system prompt to handle DTMF input messages = [ { "role" : "system" , "content" : """You are a phone menu assistant. When you receive input starting with "Menu selection:", this represents button presses on the phone keypad: - Single digits (1-9): Menu options - 0: Often "speak to operator" - *: Confirmation or "go back" - #: Usually "repeat menu" Respond appropriately to both voice and keypad input.""" } ] ​ Sequence Examples User Input Aggregation Trigger Output TranscriptionFrame 1 , 2 , 3 , # Termination digit "DTMF: 123#" * , 0 2-second timeout "DTMF: *0" 5 , interruption StartInterruptionFrame "DTMF: 5" 9 , 9 , EndFrame Pipeline shutdown "DTMF: 99" ​ Frame Flow ​ Error Handling The aggregator gracefully handles: Invalid DTMF digits (logged and ignored) Pipeline interruptions (flushes pending sequences) Rapid key presses (buffers efficiently) Mixed voice and DTMF input (processes independently) ​ Best Practices System Prompt Design : Train your LLM to recognize and respond to DTMF prefixed input Timeout Configuration : Use shorter timeouts (1-2s) for rapid entry, longer (3-5s) for menu selection Termination Strategy : Use # for confirmation, * for cancel/back operations Pipeline Placement : Always place before the user context aggregator to ensure proper frame ordering Watchdog Timers MarkdownTextFilter On this page Overview Constructor Input Frames Output Frames Keypad Entries Aggregation Behavior Usage Examples Basic Telephony Integration Custom Configuration for Menu Systems Sequence Examples Frame Flow Error Handling Best Practices Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_dtmf-aggregator_fe8e1468.txt b/utilities_dtmf-aggregator_fe8e1468.txt
new file mode 100644
index 0000000000000000000000000000000000000000..01ed25f75bff90276dc0027f8363960443cef58c
--- /dev/null
+++ b/utilities_dtmf-aggregator_fe8e1468.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/dtmf-aggregator#output-frames
+Title: DTMFAggregator - Pipecat
+==================================================
+
+DTMFAggregator - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Telephony DTMFAggregator Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony DTMFAggregator Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview DTMFAggregator processes incoming DTMF (Dual-Tone Multi-Frequency) frames from phone keypad input and aggregates them into complete sequences that can be understood by LLM services. It buffers individual digit presses and flushes them as transcription frames when a termination digit is pressed, a timeout occurs, or an interruption happens. This aggregator is essential for telephony applications where users interact via phone keypad buttons, converting raw DTMF input into structured text that LLMs can process alongside voice transcriptions. ​ Constructor Copy Ask AI aggregator = DTMFAggregator( timeout = 2.0 , termination_digit = KeypadEntry. POUND , prefix = "DTMF: " ) ​ timeout float default: "2.0" Idle timeout in seconds before flushing the aggregated digits ​ termination_digit KeypadEntry default: "KeypadEntry.POUND" Digit that triggers immediate flush of the aggregation ​ prefix str default: "DTMF: " Prefix added to DTMF sequence in the output transcription ​ Input Frames ​ InputDTMFFrame Frame Contains a single keypad button press with a KeypadEntry value ​ StartInterruptionFrame Frame Flushes any pending aggregation when user interruption begins ​ EndFrame Frame Flushes pending aggregation and stops the aggregation task ​ Output Frames ​ TranscriptionFrame Frame Contains the aggregated DTMF sequence as text with the configured prefix All input frames are passed through downstream, including the original InputDTMFFrame instances. ​ Keypad Entries The aggregator processes these standard phone keypad entries: KeypadEntry Value Description ZERO through NINE "0" - "9" Numeric digits STAR "*" Star/asterisk key POUND "#" Pound/hash key ​ Aggregation Behavior The aggregator flushes (emits a TranscriptionFrame) when: Termination digit : The configured termination digit is pressed (default: # ) Timeout : No new digits received within the timeout period (default: 2 seconds) Interruption : A StartInterruptionFrame is received Pipeline end : An EndFrame is received ​ Usage Examples ​ Basic Telephony Integration Copy Ask AI from pipecat.processors.aggregators.dtmf_aggregator import DTMFAggregator from pipecat.serializers.twilio import TwilioFrameSerializer # Create DTMF aggregator with default settings dtmf_aggregator = DTMFAggregator() # Set up Twilio serializer for phone integration serializer = TwilioFrameSerializer( stream_sid = stream_sid, call_sid = call_sid, account_sid = os.getenv( "TWILIO_ACCOUNT_SID" ), auth_token = os.getenv( "TWILIO_AUTH_TOKEN" ) ) # Create pipeline with DTMF processing pipeline = Pipeline([ transport.input(), # Websocket input from Twilio dtmf_aggregator, # Process DTMF before STT stt, # Speech-to-text service context_aggregator.user(), llm, # LLM processes both voice and DTMF tts, # Text-to-speech transport.output(), context_aggregator.assistant(), ]) ​ Custom Configuration for Menu Systems Copy Ask AI # Configure for menu system with star termination menu_dtmf = DTMFAggregator( timeout = 5.0 , # Longer timeout for menu selection termination_digit = KeypadEntry. STAR , # Use * to confirm selection prefix = "Menu selection: " # Clear prefix for LLM ) # Update system prompt to handle DTMF input messages = [ { "role" : "system" , "content" : """You are a phone menu assistant. When you receive input starting with "Menu selection:", this represents button presses on the phone keypad: - Single digits (1-9): Menu options - 0: Often "speak to operator" - *: Confirmation or "go back" - #: Usually "repeat menu" Respond appropriately to both voice and keypad input.""" } ] ​ Sequence Examples User Input Aggregation Trigger Output TranscriptionFrame 1 , 2 , 3 , # Termination digit "DTMF: 123#" * , 0 2-second timeout "DTMF: *0" 5 , interruption StartInterruptionFrame "DTMF: 5" 9 , 9 , EndFrame Pipeline shutdown "DTMF: 99" ​ Frame Flow ​ Error Handling The aggregator gracefully handles: Invalid DTMF digits (logged and ignored) Pipeline interruptions (flushes pending sequences) Rapid key presses (buffers efficiently) Mixed voice and DTMF input (processes independently) ​ Best Practices System Prompt Design : Train your LLM to recognize and respond to DTMF prefixed input Timeout Configuration : Use shorter timeouts (1-2s) for rapid entry, longer (3-5s) for menu selection Termination Strategy : Use # for confirmation, * for cancel/back operations Pipeline Placement : Always place before the user context aggregator to ensure proper frame ordering Watchdog Timers MarkdownTextFilter On this page Overview Constructor Input Frames Output Frames Keypad Entries Aggregation Behavior Usage Examples Basic Telephony Integration Custom Configuration for Menu Systems Sequence Examples Frame Flow Error Handling Best Practices Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_opentelemetry_44ba33bf.txt b/utilities_opentelemetry_44ba33bf.txt
new file mode 100644
index 0000000000000000000000000000000000000000..510e9f8df5998e0336b6a5f820cc0b12dc2ae6c4
--- /dev/null
+++ b/utilities_opentelemetry_44ba33bf.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/opentelemetry#example
+Title: OpenTelemetry Tracing - Pipecat
+==================================================
+
+OpenTelemetry Tracing - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Metrics and Telemetry OpenTelemetry Tracing Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry OpenTelemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Pipecat includes built-in support for OpenTelemetry tracing, allowing you to gain deep visibility into your voice applications. Tracing helps you: Track latency and performance across your conversation pipeline Monitor service health and identify bottlenecks Visualize conversation turns and service dependencies Collect usage metrics and operational analytics ​ Installation To use OpenTelemetry tracing with Pipecat, install the tracing dependencies: Copy Ask AI pip install "pipecat-ai[tracing]" For local development and testing, we recommend using Jaeger as a trace collector. You can run it with Docker: Copy Ask AI docker run -d --name jaeger \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latest Then access the UI at http://localhost:16686 ​ Basic Setup Enabling tracing in your Pipecat application requires two steps: Initialize the OpenTelemetry SDK with your preferred exporter Enable tracing in your PipelineTask Copy Ask AI import os from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.utils.tracing.setup import setup_tracing from pipecat.pipeline.task import PipelineTask, PipelineParams # Step 1: Initialize OpenTelemetry with your chosen exporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Jaeger or other collector endpoint insecure = True , ) setup_tracing( service_name = "my-voice-app" , exporter = exporter, console_export = False , # Set to True for debug output ) # Step 2: Enable tracing in your PipelineTask task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , # Required for some service metrics ), enable_tracing = True , # Enable tracing for this task enable_turn_tracking = True , # Enable turn tracking for this task conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) For complete working examples, see our sample implementations: Jaeger Tracing Example - Uses gRPC exporter with Jaeger Langfuse Tracing Example - Uses HTTP exporter with Langfuse for LLM-focused observability ​ Trace Structure Pipecat organizes traces hierarchically, following the natural structure of conversations: Copy Ask AI Conversation (conversation) ├── turn │ ├── stt │ ├── llm │ └── tts └── turn ├── stt ├── llm └── tts turn... For real-time multimodal services like Gemini Live and OpenAI Realtime, the structure adapts to their specific patterns: Copy Ask AI Conversation (conversation) ├── turn │ ├── llm_setup (session configuration) │ ├── stt (user input) │ ├── llm_response (complete response with usage) │ └── llm_tool_call/llm_tool_result (for function calls) └── turn ├── stt (user input) └── llm_response (complete response) turn... This hierarchical structure makes it easy to: Track the full lifecycle of a conversation Measure latency for individual turns Identify which services are contributing to delays Compare performance across different conversations ​ Exporter Options Pipecat supports any OpenTelemetry-compatible exporter. Common options include: ​ OTLP Exporter (for Jaeger, Grafana, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Your collector endpoint insecure = True , # Use False for TLS connections ) ​ HTTP OTLP Exporter (for Langfuse, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( # Configure with environment variables: # OTEL_EXPORTER_OTLP_ENDPOINT # OTEL_EXPORTER_OTLP_HEADERS ) See our Langfuse example for details on configuring this exporter. ​ Console Exporter (for debugging) The console exporter can be enabled alongside any other exporter by setting console_export=True : Copy Ask AI setup_tracing( service_name = "my-voice-app" , exporter = otlp_exporter, console_export = True , # Prints traces to stdout ) ​ Cloud Provider Exporters Many cloud providers offer OpenTelemetry-compatible observability services: AWS X-Ray Google Cloud Trace Azure Monitor Datadog APM Check the OpenTelemetry documentation for specific exporter configurations: OpenTelemetry Vendors ​ Span Attributes Pipecat enriches spans with detailed attributes about service operations: ​ TTS Service Spans gen_ai.system : Service provider (e.g., “cartesia”) gen_ai.request.model : Model ID/name voice_id : Voice identifier text : The text being synthesized metrics.character_count : Number of characters in the text metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters ​ STT Service Spans gen_ai.system : Service provider (e.g., “deepgram”) gen_ai.request.model : Model ID/name transcript : The transcribed text is_final : Whether the transcription is final language : Detected or configured language vad_enabled : Whether voice activity detection is enabled metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters ​ LLM Service Spans gen_ai.system : Service provider (e.g., “openai”, “gcp.gemini”) gen_ai.request.model : Model ID/name gen_ai.operation.name : Operation type (e.g., “chat”) stream : Whether streaming is enabled input : JSON-serialized input messages output : Complete response text tools : JSON-serialized tools configuration tools.count : Number of tools available tools.names : Comma-separated tool names system : System message content gen_ai.usage.input_tokens : Number of prompt tokens gen_ai.usage.output_tokens : Number of completion tokens metrics.ttfb : Time to first byte in seconds gen_ai.request.* : Standard parameters (temperature, max_tokens, etc.) ​ Multimodal Service Spans (Gemini Live & OpenAI Realtime) ​ Setup Spans gen_ai.system : “gcp.gemini” or “openai” gen_ai.request.model : Model identifier tools.count : Number of available tools tools.definitions : JSON-serialized tool schemas system_instruction : System prompt (truncated) session.* : Session configuration parameters ​ Request Spans (OpenAI Realtime) input : JSON-serialized context messages being sent gen_ai.operation.name : “llm_request” ​ Response Spans output : Complete assistant response text output_modality : “TEXT” or “AUDIO” (Gemini Live) gen_ai.usage.input_tokens : Prompt tokens used gen_ai.usage.output_tokens : Completion tokens generated function_calls.count : Number of function calls made function_calls.names : Comma-separated function names metrics.ttfb : Time to first response in seconds ​ Tool Call/Result Spans (Gemini Live) tool.function_name : Name of the function being called tool.call_id : Unique identifier for the call tool.arguments : Function arguments (truncated) tool.result : Function execution result (truncated) tool.result_status : “completed”, “error”, or “parse_error” ​ Turn Spans turn.number : Sequential turn number turn.type : Type of turn (e.g., “conversation”) turn.duration_seconds : Duration of the turn turn.was_interrupted : Whether the turn was interrupted conversation.id : ID of the parent conversation ​ Conversation Spans conversation.id : Unique identifier for the conversation conversation.type : Type of conversation (e.g., “voice”) ​ Usage Metrics Pipecat’s tracing implementation automatically captures usage metrics for LLM and TTS services: ​ LLM Token Usage Token usage is captured in LLM spans as: gen_ai.usage.input_tokens gen_ai.usage.output_tokens ​ TTS Character Count Character count is captured in TTS spans as: metrics.character_count ​ Performance Metrics Pipecat traces capture key performance metrics for each service: ​ Time To First Byte (TTFB) The time it takes for a service to produce its first response: metrics.ttfb (in seconds) ​ Processing Duration The total time spent processing in each service is captured in the span duration. ​ Configuration Options ​ PipelineTask Parameters ​ enable_tracing bool default: "True" Enable or disable tracing for the pipeline ​ enable_turn_tracking bool default: "False" Whether to enable turn tracking. ​ conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated ​ additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. ​ setup_tracing() Parameters ​ service_name str default: "pipecat" Name of the service for traces ​ exporter Optional[SpanExporter] default: "None" A pre-configured OpenTelemetry span exporter instance ​ console_export bool default: "False" Whether to also export traces to console (useful for debugging) ​ Example Here’s a complete example showing OpenTelemetry tracing setup with Jaeger: Copy Ask AI import os from dotenv import load_dotenv from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.utils.tracing.setup import setup_tracing load_dotenv() # Initialize tracing if enabled if os.getenv( "ENABLE_TRACING" ): # Create the exporter otlp_exporter = OTLPSpanExporter( endpoint = os.getenv( "OTEL_EXPORTER_OTLP_ENDPOINT" , "http://localhost:4317" ), insecure = True , ) # Set up tracing with the exporter setup_tracing( service_name = "pipecat-demo" , exporter = otlp_exporter, console_export = bool (os.getenv( "OTEL_CONSOLE_EXPORT" )), ) # Create your services stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" )) tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your-voice-id" ) # Build pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) # Create pipeline task with tracing enabled task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), enable_tracing = True , enable_turn_tracking = True , conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) # Run the pipeline runner = PipelineRunner() await runner.run(task) ​ Troubleshooting If you’re having issues with tracing: No Traces Visible : Ensure the OpenTelemetry packages are installed and that your collector endpoint is correct Missing Service Data : Verify that enable_metrics=True is set in PipelineParams Debugging Tracing : Enable console export with console_export=True to view traces in your logs Connection Errors : Check network connectivity to your trace collector Collector Configuration : Verify your collector is properly set up to receive traces ​ References OpenTelemetry Python Documentation OpenTelemetry Tracing Specification Jaeger Documentation Langfuse OpenTelemetry Documentation WakeNotifierFilter MCPClient On this page Overview Installation Basic Setup Trace Structure Exporter Options OTLP Exporter (for Jaeger, Grafana, etc.) HTTP OTLP Exporter (for Langfuse, etc.) Console Exporter (for debugging) Cloud Provider Exporters Span Attributes TTS Service Spans STT Service Spans LLM Service Spans Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans Request Spans (OpenAI Realtime) Response Spans Tool Call/Result Spans (Gemini Live) Turn Spans Conversation Spans Usage Metrics LLM Token Usage TTS Character Count Performance Metrics Time To First Byte (TTFB) Processing Duration Configuration Options PipelineTask Parameters setup_tracing() Parameters Example Troubleshooting References Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_opentelemetry_b2cdf6c7.txt b/utilities_opentelemetry_b2cdf6c7.txt
new file mode 100644
index 0000000000000000000000000000000000000000..2b247554ff3b24b427bbb41ad7845f8d856a9182
--- /dev/null
+++ b/utilities_opentelemetry_b2cdf6c7.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/opentelemetry#param-enable-turn-tracking
+Title: OpenTelemetry Tracing - Pipecat
+==================================================
+
+OpenTelemetry Tracing - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Metrics and Telemetry OpenTelemetry Tracing Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry OpenTelemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Pipecat includes built-in support for OpenTelemetry tracing, allowing you to gain deep visibility into your voice applications. Tracing helps you: Track latency and performance across your conversation pipeline Monitor service health and identify bottlenecks Visualize conversation turns and service dependencies Collect usage metrics and operational analytics ​ Installation To use OpenTelemetry tracing with Pipecat, install the tracing dependencies: Copy Ask AI pip install "pipecat-ai[tracing]" For local development and testing, we recommend using Jaeger as a trace collector. You can run it with Docker: Copy Ask AI docker run -d --name jaeger \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latest Then access the UI at http://localhost:16686 ​ Basic Setup Enabling tracing in your Pipecat application requires two steps: Initialize the OpenTelemetry SDK with your preferred exporter Enable tracing in your PipelineTask Copy Ask AI import os from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.utils.tracing.setup import setup_tracing from pipecat.pipeline.task import PipelineTask, PipelineParams # Step 1: Initialize OpenTelemetry with your chosen exporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Jaeger or other collector endpoint insecure = True , ) setup_tracing( service_name = "my-voice-app" , exporter = exporter, console_export = False , # Set to True for debug output ) # Step 2: Enable tracing in your PipelineTask task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , # Required for some service metrics ), enable_tracing = True , # Enable tracing for this task enable_turn_tracking = True , # Enable turn tracking for this task conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) For complete working examples, see our sample implementations: Jaeger Tracing Example - Uses gRPC exporter with Jaeger Langfuse Tracing Example - Uses HTTP exporter with Langfuse for LLM-focused observability ​ Trace Structure Pipecat organizes traces hierarchically, following the natural structure of conversations: Copy Ask AI Conversation (conversation) ├── turn │ ├── stt │ ├── llm │ └── tts └── turn ├── stt ├── llm └── tts turn... For real-time multimodal services like Gemini Live and OpenAI Realtime, the structure adapts to their specific patterns: Copy Ask AI Conversation (conversation) ├── turn │ ├── llm_setup (session configuration) │ ├── stt (user input) │ ├── llm_response (complete response with usage) │ └── llm_tool_call/llm_tool_result (for function calls) └── turn ├── stt (user input) └── llm_response (complete response) turn... This hierarchical structure makes it easy to: Track the full lifecycle of a conversation Measure latency for individual turns Identify which services are contributing to delays Compare performance across different conversations ​ Exporter Options Pipecat supports any OpenTelemetry-compatible exporter. Common options include: ​ OTLP Exporter (for Jaeger, Grafana, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Your collector endpoint insecure = True , # Use False for TLS connections ) ​ HTTP OTLP Exporter (for Langfuse, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( # Configure with environment variables: # OTEL_EXPORTER_OTLP_ENDPOINT # OTEL_EXPORTER_OTLP_HEADERS ) See our Langfuse example for details on configuring this exporter. ​ Console Exporter (for debugging) The console exporter can be enabled alongside any other exporter by setting console_export=True : Copy Ask AI setup_tracing( service_name = "my-voice-app" , exporter = otlp_exporter, console_export = True , # Prints traces to stdout ) ​ Cloud Provider Exporters Many cloud providers offer OpenTelemetry-compatible observability services: AWS X-Ray Google Cloud Trace Azure Monitor Datadog APM Check the OpenTelemetry documentation for specific exporter configurations: OpenTelemetry Vendors ​ Span Attributes Pipecat enriches spans with detailed attributes about service operations: ​ TTS Service Spans gen_ai.system : Service provider (e.g., “cartesia”) gen_ai.request.model : Model ID/name voice_id : Voice identifier text : The text being synthesized metrics.character_count : Number of characters in the text metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters ​ STT Service Spans gen_ai.system : Service provider (e.g., “deepgram”) gen_ai.request.model : Model ID/name transcript : The transcribed text is_final : Whether the transcription is final language : Detected or configured language vad_enabled : Whether voice activity detection is enabled metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters ​ LLM Service Spans gen_ai.system : Service provider (e.g., “openai”, “gcp.gemini”) gen_ai.request.model : Model ID/name gen_ai.operation.name : Operation type (e.g., “chat”) stream : Whether streaming is enabled input : JSON-serialized input messages output : Complete response text tools : JSON-serialized tools configuration tools.count : Number of tools available tools.names : Comma-separated tool names system : System message content gen_ai.usage.input_tokens : Number of prompt tokens gen_ai.usage.output_tokens : Number of completion tokens metrics.ttfb : Time to first byte in seconds gen_ai.request.* : Standard parameters (temperature, max_tokens, etc.) ​ Multimodal Service Spans (Gemini Live & OpenAI Realtime) ​ Setup Spans gen_ai.system : “gcp.gemini” or “openai” gen_ai.request.model : Model identifier tools.count : Number of available tools tools.definitions : JSON-serialized tool schemas system_instruction : System prompt (truncated) session.* : Session configuration parameters ​ Request Spans (OpenAI Realtime) input : JSON-serialized context messages being sent gen_ai.operation.name : “llm_request” ​ Response Spans output : Complete assistant response text output_modality : “TEXT” or “AUDIO” (Gemini Live) gen_ai.usage.input_tokens : Prompt tokens used gen_ai.usage.output_tokens : Completion tokens generated function_calls.count : Number of function calls made function_calls.names : Comma-separated function names metrics.ttfb : Time to first response in seconds ​ Tool Call/Result Spans (Gemini Live) tool.function_name : Name of the function being called tool.call_id : Unique identifier for the call tool.arguments : Function arguments (truncated) tool.result : Function execution result (truncated) tool.result_status : “completed”, “error”, or “parse_error” ​ Turn Spans turn.number : Sequential turn number turn.type : Type of turn (e.g., “conversation”) turn.duration_seconds : Duration of the turn turn.was_interrupted : Whether the turn was interrupted conversation.id : ID of the parent conversation ​ Conversation Spans conversation.id : Unique identifier for the conversation conversation.type : Type of conversation (e.g., “voice”) ​ Usage Metrics Pipecat’s tracing implementation automatically captures usage metrics for LLM and TTS services: ​ LLM Token Usage Token usage is captured in LLM spans as: gen_ai.usage.input_tokens gen_ai.usage.output_tokens ​ TTS Character Count Character count is captured in TTS spans as: metrics.character_count ​ Performance Metrics Pipecat traces capture key performance metrics for each service: ​ Time To First Byte (TTFB) The time it takes for a service to produce its first response: metrics.ttfb (in seconds) ​ Processing Duration The total time spent processing in each service is captured in the span duration. ​ Configuration Options ​ PipelineTask Parameters ​ enable_tracing bool default: "True" Enable or disable tracing for the pipeline ​ enable_turn_tracking bool default: "False" Whether to enable turn tracking. ​ conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated ​ additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. ​ setup_tracing() Parameters ​ service_name str default: "pipecat" Name of the service for traces ​ exporter Optional[SpanExporter] default: "None" A pre-configured OpenTelemetry span exporter instance ​ console_export bool default: "False" Whether to also export traces to console (useful for debugging) ​ Example Here’s a complete example showing OpenTelemetry tracing setup with Jaeger: Copy Ask AI import os from dotenv import load_dotenv from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.utils.tracing.setup import setup_tracing load_dotenv() # Initialize tracing if enabled if os.getenv( "ENABLE_TRACING" ): # Create the exporter otlp_exporter = OTLPSpanExporter( endpoint = os.getenv( "OTEL_EXPORTER_OTLP_ENDPOINT" , "http://localhost:4317" ), insecure = True , ) # Set up tracing with the exporter setup_tracing( service_name = "pipecat-demo" , exporter = otlp_exporter, console_export = bool (os.getenv( "OTEL_CONSOLE_EXPORT" )), ) # Create your services stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" )) tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your-voice-id" ) # Build pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) # Create pipeline task with tracing enabled task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), enable_tracing = True , enable_turn_tracking = True , conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) # Run the pipeline runner = PipelineRunner() await runner.run(task) ​ Troubleshooting If you’re having issues with tracing: No Traces Visible : Ensure the OpenTelemetry packages are installed and that your collector endpoint is correct Missing Service Data : Verify that enable_metrics=True is set in PipelineParams Debugging Tracing : Enable console export with console_export=True to view traces in your logs Connection Errors : Check network connectivity to your trace collector Collector Configuration : Verify your collector is properly set up to receive traces ​ References OpenTelemetry Python Documentation OpenTelemetry Tracing Specification Jaeger Documentation Langfuse OpenTelemetry Documentation WakeNotifierFilter MCPClient On this page Overview Installation Basic Setup Trace Structure Exporter Options OTLP Exporter (for Jaeger, Grafana, etc.) HTTP OTLP Exporter (for Langfuse, etc.) Console Exporter (for debugging) Cloud Provider Exporters Span Attributes TTS Service Spans STT Service Spans LLM Service Spans Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans Request Spans (OpenAI Realtime) Response Spans Tool Call/Result Spans (Gemini Live) Turn Spans Conversation Spans Usage Metrics LLM Token Usage TTS Character Count Performance Metrics Time To First Byte (TTFB) Processing Duration Configuration Options PipelineTask Parameters setup_tracing() Parameters Example Troubleshooting References Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_opentelemetry_c700a4c0.txt b/utilities_opentelemetry_c700a4c0.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c34b44262d060e780047c970c20e3dfe3eb67b5c
--- /dev/null
+++ b/utilities_opentelemetry_c700a4c0.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/opentelemetry#console-exporter-for-debugging
+Title: OpenTelemetry Tracing - Pipecat
+==================================================
+
+OpenTelemetry Tracing - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Metrics and Telemetry OpenTelemetry Tracing Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry OpenTelemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview Pipecat includes built-in support for OpenTelemetry tracing, allowing you to gain deep visibility into your voice applications. Tracing helps you: Track latency and performance across your conversation pipeline Monitor service health and identify bottlenecks Visualize conversation turns and service dependencies Collect usage metrics and operational analytics ​ Installation To use OpenTelemetry tracing with Pipecat, install the tracing dependencies: Copy Ask AI pip install "pipecat-ai[tracing]" For local development and testing, we recommend using Jaeger as a trace collector. You can run it with Docker: Copy Ask AI docker run -d --name jaeger \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ jaegertracing/all-in-one:latest Then access the UI at http://localhost:16686 ​ Basic Setup Enabling tracing in your Pipecat application requires two steps: Initialize the OpenTelemetry SDK with your preferred exporter Enable tracing in your PipelineTask Copy Ask AI import os from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.utils.tracing.setup import setup_tracing from pipecat.pipeline.task import PipelineTask, PipelineParams # Step 1: Initialize OpenTelemetry with your chosen exporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Jaeger or other collector endpoint insecure = True , ) setup_tracing( service_name = "my-voice-app" , exporter = exporter, console_export = False , # Set to True for debug output ) # Step 2: Enable tracing in your PipelineTask task = PipelineTask( pipeline, params = PipelineParams( enable_metrics = True , # Required for some service metrics ), enable_tracing = True , # Enable tracing for this task enable_turn_tracking = True , # Enable turn tracking for this task conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) For complete working examples, see our sample implementations: Jaeger Tracing Example - Uses gRPC exporter with Jaeger Langfuse Tracing Example - Uses HTTP exporter with Langfuse for LLM-focused observability ​ Trace Structure Pipecat organizes traces hierarchically, following the natural structure of conversations: Copy Ask AI Conversation (conversation) ├── turn │ ├── stt │ ├── llm │ └── tts └── turn ├── stt ├── llm └── tts turn... For real-time multimodal services like Gemini Live and OpenAI Realtime, the structure adapts to their specific patterns: Copy Ask AI Conversation (conversation) ├── turn │ ├── llm_setup (session configuration) │ ├── stt (user input) │ ├── llm_response (complete response with usage) │ └── llm_tool_call/llm_tool_result (for function calls) └── turn ├── stt (user input) └── llm_response (complete response) turn... This hierarchical structure makes it easy to: Track the full lifecycle of a conversation Measure latency for individual turns Identify which services are contributing to delays Compare performance across different conversations ​ Exporter Options Pipecat supports any OpenTelemetry-compatible exporter. Common options include: ​ OTLP Exporter (for Jaeger, Grafana, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( endpoint = "http://localhost:4317" , # Your collector endpoint insecure = True , # Use False for TLS connections ) ​ HTTP OTLP Exporter (for Langfuse, etc.) Copy Ask AI from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter exporter = OTLPSpanExporter( # Configure with environment variables: # OTEL_EXPORTER_OTLP_ENDPOINT # OTEL_EXPORTER_OTLP_HEADERS ) See our Langfuse example for details on configuring this exporter. ​ Console Exporter (for debugging) The console exporter can be enabled alongside any other exporter by setting console_export=True : Copy Ask AI setup_tracing( service_name = "my-voice-app" , exporter = otlp_exporter, console_export = True , # Prints traces to stdout ) ​ Cloud Provider Exporters Many cloud providers offer OpenTelemetry-compatible observability services: AWS X-Ray Google Cloud Trace Azure Monitor Datadog APM Check the OpenTelemetry documentation for specific exporter configurations: OpenTelemetry Vendors ​ Span Attributes Pipecat enriches spans with detailed attributes about service operations: ​ TTS Service Spans gen_ai.system : Service provider (e.g., “cartesia”) gen_ai.request.model : Model ID/name voice_id : Voice identifier text : The text being synthesized metrics.character_count : Number of characters in the text metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters ​ STT Service Spans gen_ai.system : Service provider (e.g., “deepgram”) gen_ai.request.model : Model ID/name transcript : The transcribed text is_final : Whether the transcription is final language : Detected or configured language vad_enabled : Whether voice activity detection is enabled metrics.ttfb : Time to first byte in seconds settings.* : Service-specific configuration parameters ​ LLM Service Spans gen_ai.system : Service provider (e.g., “openai”, “gcp.gemini”) gen_ai.request.model : Model ID/name gen_ai.operation.name : Operation type (e.g., “chat”) stream : Whether streaming is enabled input : JSON-serialized input messages output : Complete response text tools : JSON-serialized tools configuration tools.count : Number of tools available tools.names : Comma-separated tool names system : System message content gen_ai.usage.input_tokens : Number of prompt tokens gen_ai.usage.output_tokens : Number of completion tokens metrics.ttfb : Time to first byte in seconds gen_ai.request.* : Standard parameters (temperature, max_tokens, etc.) ​ Multimodal Service Spans (Gemini Live & OpenAI Realtime) ​ Setup Spans gen_ai.system : “gcp.gemini” or “openai” gen_ai.request.model : Model identifier tools.count : Number of available tools tools.definitions : JSON-serialized tool schemas system_instruction : System prompt (truncated) session.* : Session configuration parameters ​ Request Spans (OpenAI Realtime) input : JSON-serialized context messages being sent gen_ai.operation.name : “llm_request” ​ Response Spans output : Complete assistant response text output_modality : “TEXT” or “AUDIO” (Gemini Live) gen_ai.usage.input_tokens : Prompt tokens used gen_ai.usage.output_tokens : Completion tokens generated function_calls.count : Number of function calls made function_calls.names : Comma-separated function names metrics.ttfb : Time to first response in seconds ​ Tool Call/Result Spans (Gemini Live) tool.function_name : Name of the function being called tool.call_id : Unique identifier for the call tool.arguments : Function arguments (truncated) tool.result : Function execution result (truncated) tool.result_status : “completed”, “error”, or “parse_error” ​ Turn Spans turn.number : Sequential turn number turn.type : Type of turn (e.g., “conversation”) turn.duration_seconds : Duration of the turn turn.was_interrupted : Whether the turn was interrupted conversation.id : ID of the parent conversation ​ Conversation Spans conversation.id : Unique identifier for the conversation conversation.type : Type of conversation (e.g., “voice”) ​ Usage Metrics Pipecat’s tracing implementation automatically captures usage metrics for LLM and TTS services: ​ LLM Token Usage Token usage is captured in LLM spans as: gen_ai.usage.input_tokens gen_ai.usage.output_tokens ​ TTS Character Count Character count is captured in TTS spans as: metrics.character_count ​ Performance Metrics Pipecat traces capture key performance metrics for each service: ​ Time To First Byte (TTFB) The time it takes for a service to produce its first response: metrics.ttfb (in seconds) ​ Processing Duration The total time spent processing in each service is captured in the span duration. ​ Configuration Options ​ PipelineTask Parameters ​ enable_tracing bool default: "True" Enable or disable tracing for the pipeline ​ enable_turn_tracking bool default: "False" Whether to enable turn tracking. ​ conversation_id Optional[str] default: "None" Custom ID for the conversation. If not provided, a UUID will be generated ​ additional_span_attributes Optional[dict] default: "None" Any additional attributes to add to top-level OpenTelemetry conversation span. ​ setup_tracing() Parameters ​ service_name str default: "pipecat" Name of the service for traces ​ exporter Optional[SpanExporter] default: "None" A pre-configured OpenTelemetry span exporter instance ​ console_export bool default: "False" Whether to also export traces to console (useful for debugging) ​ Example Here’s a complete example showing OpenTelemetry tracing setup with Jaeger: Copy Ask AI import os from dotenv import load_dotenv from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.utils.tracing.setup import setup_tracing load_dotenv() # Initialize tracing if enabled if os.getenv( "ENABLE_TRACING" ): # Create the exporter otlp_exporter = OTLPSpanExporter( endpoint = os.getenv( "OTEL_EXPORTER_OTLP_ENDPOINT" , "http://localhost:4317" ), insecure = True , ) # Set up tracing with the exporter setup_tracing( service_name = "pipecat-demo" , exporter = otlp_exporter, console_export = bool (os.getenv( "OTEL_CONSOLE_EXPORT" )), ) # Create your services stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" )) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" )) tts = CartesiaTTSService( api_key = os.getenv( "CARTESIA_API_KEY" ), voice_id = "your-voice-id" ) # Build pipeline pipeline = Pipeline([ transport.input(), stt, context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) # Create pipeline task with tracing enabled task = PipelineTask( pipeline, params = PipelineParams( allow_interruptions = True , enable_metrics = True , enable_usage_metrics = True , ), enable_tracing = True , enable_turn_tracking = True , conversation_id = "customer-123" , # Optional - will auto-generate if not provided additional_span_attributes = { "session.id" : "abc-123" } # Optional - additional attributes to attach to the otel span ) # Run the pipeline runner = PipelineRunner() await runner.run(task) ​ Troubleshooting If you’re having issues with tracing: No Traces Visible : Ensure the OpenTelemetry packages are installed and that your collector endpoint is correct Missing Service Data : Verify that enable_metrics=True is set in PipelineParams Debugging Tracing : Enable console export with console_export=True to view traces in your logs Connection Errors : Check network connectivity to your trace collector Collector Configuration : Verify your collector is properly set up to receive traces ​ References OpenTelemetry Python Documentation OpenTelemetry Tracing Specification Jaeger Documentation Langfuse OpenTelemetry Documentation WakeNotifierFilter MCPClient On this page Overview Installation Basic Setup Trace Structure Exporter Options OTLP Exporter (for Jaeger, Grafana, etc.) HTTP OTLP Exporter (for Langfuse, etc.) Console Exporter (for debugging) Cloud Provider Exporters Span Attributes TTS Service Spans STT Service Spans LLM Service Spans Multimodal Service Spans (Gemini Live & OpenAI Realtime) Setup Spans Request Spans (OpenAI Realtime) Response Spans Tool Call/Result Spans (Gemini Live) Turn Spans Conversation Spans Usage Metrics LLM Token Usage TTS Character Count Performance Metrics Time To First Byte (TTFB) Processing Duration Configuration Options PipelineTask Parameters setup_tracing() Parameters Example Troubleshooting References Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_transcript-processor_177b2741.txt b/utilities_transcript-processor_177b2741.txt
new file mode 100644
index 0000000000000000000000000000000000000000..4fc2cee0c40f646ee6e975b06fd126368f7bf145
--- /dev/null
+++ b/utilities_transcript-processor_177b2741.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/transcript-processor#transcriptionupdateframe
+Title: TranscriptProcessor - Pipecat
+==================================================
+
+TranscriptProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation User and Bot Transcriptions TranscriptProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions TranscriptProcessor User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The TranscriptProcessor is a factory class that creates and manages processors for handling conversation transcripts from both users and assistants. It provides unified access to transcript processors with shared event handling, making it easy to track and respond to conversation updates in real-time. The processor normalizes messages from various sources into a consistent TranscriptionMessage format and emits events when new messages are added to the conversation. ​ Constructor Copy Ask AI TranscriptProcessor() Creates a new transcript processor factory with no parameters. ​ Methods ​ user() Copy Ask AI def user ( ** kwargs ) -> UserTranscriptProcessor Get or create the user transcript processor instance. This processor handles TranscriptionFrame s from STT services. Parameters: **kwargs : Arguments passed to the UserTranscriptProcessor constructor Returns: UserTranscriptProcessor instance for processing user messages. ​ assistant() Copy Ask AI def assistant ( ** kwargs ) -> AssistantTranscriptProcessor Get or create the assistant transcript processor instance. This processor handles TTSTextFrame s from TTS services and aggregates them into complete utterances. Parameters: **kwargs : Arguments passed to the AssistantTranscriptProcessor constructor Returns: AssistantTranscriptProcessor instance for processing assistant messages. ​ event_handler() Copy Ask AI def event_handler ( event_name : str ) Decorator that registers event handlers for both user and assistant processors. Parameters: event_name : Name of the event to handle Returns: Decorator function that registers the handler with both processors. ​ Event Handlers ​ on_transcript_update Triggered when new messages are added to the conversation transcript. Copy Ask AI @transcript.event_handler ( "on_transcript_update" ) async def handle_transcript_update ( processor , frame ): # Handle transcript updates pass Parameters: processor : The specific processor instance that emitted the event (UserTranscriptProcessor or AssistantTranscriptProcessor) frame : TranscriptionUpdateFrame containing the new messages ​ Data Structures ​ TranscriptionMessage Copy Ask AI @dataclass class TranscriptionMessage : role: Literal[ "user" , "assistant" ] content: str timestamp: str | None = None user_id: str | None = None Fields: role : The message sender type (“user” or “assistant”) content : The transcribed text content timestamp : ISO 8601 timestamp when the message was created user_id : Optional user identifier (for user messages only) ​ TranscriptionUpdateFrame Frame containing new transcript messages, emitted by the on_transcript_update event. Properties: messages : List of TranscriptionMessage objects containing the new transcript content ​ Frames ​ UserTranscriptProcessor Input: TranscriptionFrame from STT services Output: TranscriptionMessage with role “user” ​ AssistantTranscriptProcessor Input: TTSTextFrame from TTS services Output: TranscriptionMessage with role “assistant” ​ Integration Notes ​ Pipeline Placement Place the processors at specific positions in your pipeline for accurate transcript collection: Copy Ask AI pipeline = Pipeline([ transport.input(), stt, # Speech-to-text service transcript.user(), # Place after STT context_aggregator.user(), llm, tts, # Text-to-speech service transport.output(), transcript.assistant(), # Place after transport.output() context_aggregator.assistant(), ]) ​ Event Handler Registration Event handlers are automatically applied to both user and assistant processors: Copy Ask AI transcript = TranscriptProcessor() # This handler will receive events from both processors @transcript.event_handler ( "on_transcript_update" ) async def handle_update ( processor , frame ): for message in frame.messages: print ( f " { message.role } : { message.content } " ) PatternPairAggregator Interruption Strategies On this page Overview Constructor Methods user() assistant() event_handler() Event Handlers on_transcript_update Data Structures TranscriptionMessage TranscriptionUpdateFrame Frames UserTranscriptProcessor AssistantTranscriptProcessor Integration Notes Pipeline Placement Event Handler Registration Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_transcript-processor_27d5a04c.txt b/utilities_transcript-processor_27d5a04c.txt
new file mode 100644
index 0000000000000000000000000000000000000000..9199d9a560f29dcffb2a0e24958b2cdc79f6a171
--- /dev/null
+++ b/utilities_transcript-processor_27d5a04c.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/transcript-processor#transcriptionmessage
+Title: TranscriptProcessor - Pipecat
+==================================================
+
+TranscriptProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation User and Bot Transcriptions TranscriptProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions TranscriptProcessor User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview The TranscriptProcessor is a factory class that creates and manages processors for handling conversation transcripts from both users and assistants. It provides unified access to transcript processors with shared event handling, making it easy to track and respond to conversation updates in real-time. The processor normalizes messages from various sources into a consistent TranscriptionMessage format and emits events when new messages are added to the conversation. ​ Constructor Copy Ask AI TranscriptProcessor() Creates a new transcript processor factory with no parameters. ​ Methods ​ user() Copy Ask AI def user ( ** kwargs ) -> UserTranscriptProcessor Get or create the user transcript processor instance. This processor handles TranscriptionFrame s from STT services. Parameters: **kwargs : Arguments passed to the UserTranscriptProcessor constructor Returns: UserTranscriptProcessor instance for processing user messages. ​ assistant() Copy Ask AI def assistant ( ** kwargs ) -> AssistantTranscriptProcessor Get or create the assistant transcript processor instance. This processor handles TTSTextFrame s from TTS services and aggregates them into complete utterances. Parameters: **kwargs : Arguments passed to the AssistantTranscriptProcessor constructor Returns: AssistantTranscriptProcessor instance for processing assistant messages. ​ event_handler() Copy Ask AI def event_handler ( event_name : str ) Decorator that registers event handlers for both user and assistant processors. Parameters: event_name : Name of the event to handle Returns: Decorator function that registers the handler with both processors. ​ Event Handlers ​ on_transcript_update Triggered when new messages are added to the conversation transcript. Copy Ask AI @transcript.event_handler ( "on_transcript_update" ) async def handle_transcript_update ( processor , frame ): # Handle transcript updates pass Parameters: processor : The specific processor instance that emitted the event (UserTranscriptProcessor or AssistantTranscriptProcessor) frame : TranscriptionUpdateFrame containing the new messages ​ Data Structures ​ TranscriptionMessage Copy Ask AI @dataclass class TranscriptionMessage : role: Literal[ "user" , "assistant" ] content: str timestamp: str | None = None user_id: str | None = None Fields: role : The message sender type (“user” or “assistant”) content : The transcribed text content timestamp : ISO 8601 timestamp when the message was created user_id : Optional user identifier (for user messages only) ​ TranscriptionUpdateFrame Frame containing new transcript messages, emitted by the on_transcript_update event. Properties: messages : List of TranscriptionMessage objects containing the new transcript content ​ Frames ​ UserTranscriptProcessor Input: TranscriptionFrame from STT services Output: TranscriptionMessage with role “user” ​ AssistantTranscriptProcessor Input: TTSTextFrame from TTS services Output: TranscriptionMessage with role “assistant” ​ Integration Notes ​ Pipeline Placement Place the processors at specific positions in your pipeline for accurate transcript collection: Copy Ask AI pipeline = Pipeline([ transport.input(), stt, # Speech-to-text service transcript.user(), # Place after STT context_aggregator.user(), llm, tts, # Text-to-speech service transport.output(), transcript.assistant(), # Place after transport.output() context_aggregator.assistant(), ]) ​ Event Handler Registration Event handlers are automatically applied to both user and assistant processors: Copy Ask AI transcript = TranscriptProcessor() # This handler will receive events from both processors @transcript.event_handler ( "on_transcript_update" ) async def handle_update ( processor , frame ): for message in frame.messages: print ( f " { message.role } : { message.content } " ) PatternPairAggregator Interruption Strategies On this page Overview Constructor Methods user() assistant() event_handler() Event Handlers on_transcript_update Data Structures TranscriptionMessage TranscriptionUpdateFrame Frames UserTranscriptProcessor AssistantTranscriptProcessor Integration Notes Pipeline Placement Event Handler Registration Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_user-idle-processor_3f54b5e3.txt b/utilities_user-idle-processor_3f54b5e3.txt
new file mode 100644
index 0000000000000000000000000000000000000000..0d2f67772f9eb4fd39e05219750dd3b3bc115eb0
--- /dev/null
+++ b/utilities_user-idle-processor_3f54b5e3.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/user-idle-processor#behavior
+Title: UserIdleProcessor - Pipecat
+==================================================
+
+UserIdleProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Advanced Frame Processors UserIdleProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The UserIdleProcessor is a specialized frame processor that monitors user activity in a conversation and executes callbacks when the user becomes idle. It’s particularly useful for maintaining engagement by detecting periods of user inactivity and providing escalating responses to inactivity. ​ Constructor Parameters ​ callback Union[BasicCallback, RetryCallback] required An async function that will be called when user inactivity is detected. Can be either: Basic callback: async def(processor: UserIdleProcessor) -> None Retry callback: async def(processor: UserIdleProcessor, retry_count: int) -> bool where returning False stops idle monitoring ​ timeout float required The number of seconds to wait before considering the user idle. ​ Behavior The processor starts monitoring for inactivity only after the first conversation activity (either UserStartedSpeakingFrame or BotSpeakingFrame ). It manages idle state based on the following rules: Resets idle timer when user starts or stops speaking Pauses idle monitoring while user is speaking Resets idle timer when bot is speaking Stops monitoring on conversation end or cancellation Manages a retry count for the retry callback Stops monitoring when retry callback returns False ​ Properties ​ retry_count int The current number of retry attempts made to engage the user. ​ Example Implementations Here are two example showing how to use the UserIdleProcessor : one with the basic callback and one with the retry callback: Basic Callback Retry Callback Copy Ask AI from pipecat.frames.frames import LLMMessagesFrame from pipecat.pipeline.pipeline import Pipeline from pipecat.processors.user_idle_processor import UserIdleProcessor async def handle_idle ( user_idle : UserIdleProcessor) -> None : messages.append({ "role" : "system" , "content" : "Ask the user if they are still there and try to prompt for some input." }) await user_idle.push_frame(LLMMessagesFrame(messages)) # Create the processor user_idle = UserIdleProcessor( callback = handle_idle, timeout = 5.0 ) # Add to pipeline pipeline = Pipeline([ transport.input(), user_idle, # Add the processor to monitor user activity context_aggregator.user(), # ... rest of pipeline ]) Copy Ask AI from pipecat.frames.frames import LLMMessagesFrame from pipecat.pipeline.pipeline import Pipeline from pipecat.processors.user_idle_processor import UserIdleProcessor async def handle_idle ( user_idle : UserIdleProcessor) -> None : messages.append({ "role" : "system" , "content" : "Ask the user if they are still there and try to prompt for some input." }) await user_idle.push_frame(LLMMessagesFrame(messages)) # Create the processor user_idle = UserIdleProcessor( callback = handle_idle, timeout = 5.0 ) # Add to pipeline pipeline = Pipeline([ transport.input(), user_idle, # Add the processor to monitor user activity context_aggregator.user(), # ... rest of pipeline ]) Copy Ask AI from pipecat.frames.frames import EndFrame, LLMMessagesFrame, TTSSpeakFrame from pipecat.pipeline.pipeline import Pipeline from pipecat.processors.user_idle_processor import UserIdleProcessor async def handle_user_idle ( user_idle : UserIdleProcessor, retry_count : int ) -> bool : if retry_count == 1 : # First attempt: Gentle reminder messages.append({ "role" : "system" , "content" : "The user has been quiet. Politely and briefly ask if they're still there." }) await user_idle.push_frame(LLMMessagesFrame(messages)) return True elif retry_count == 2 : # Second attempt: Direct prompt messages.append({ "role" : "system" , "content" : "The user is still inactive. Ask if they'd like to continue our conversation." }) await user_idle.push_frame(LLMMessagesFrame(messages)) return True else : # Third attempt: End conversation await user_idle.push_frame( TTSSpeakFrame( "It seems like you're busy right now. Have a nice day!" ) ) await task.queue_frame(EndFrame()) return False # Stop monitoring # Create the processor user_idle = UserIdleProcessor( callback = handle_user_idle, timeout = 5.0 ) # Add to pipeline pipeline = Pipeline([ transport.input(), user_idle, # Add the processor to monitor user activity context_aggregator.user(), # ... rest of pipeline ]) ​ Frame Handling The processor handles the following frame types: UserStartedSpeakingFrame : Marks user as active, resets idle timer and retry count UserStoppedSpeakingFrame : Starts idle monitoring BotSpeakingFrame : Resets idle timer EndFrame / CancelFrame : Stops idle monitoring ​ Notes The idle callback won’t be triggered while the user or bot is actively speaking The processor automatically cleans up its resources when the pipeline ends Basic callbacks are supported for backward compatibility Producer & Consumer Processors AudioBufferProcessor On this page Constructor Parameters Behavior Properties Example Implementations Frame Handling Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/utilities_user-idle-processor_95c31dcc.txt b/utilities_user-idle-processor_95c31dcc.txt
new file mode 100644
index 0000000000000000000000000000000000000000..06086f36f365c922115841ab544ede2dbbcd4216
--- /dev/null
+++ b/utilities_user-idle-processor_95c31dcc.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/utilities/user-idle-processor
+Title: UserIdleProcessor - Pipecat
+==================================================
+
+UserIdleProcessor - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Advanced Frame Processors UserIdleProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Producer & Consumer Processors UserIdleProcessor Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline The UserIdleProcessor is a specialized frame processor that monitors user activity in a conversation and executes callbacks when the user becomes idle. It’s particularly useful for maintaining engagement by detecting periods of user inactivity and providing escalating responses to inactivity. ​ Constructor Parameters ​ callback Union[BasicCallback, RetryCallback] required An async function that will be called when user inactivity is detected. Can be either: Basic callback: async def(processor: UserIdleProcessor) -> None Retry callback: async def(processor: UserIdleProcessor, retry_count: int) -> bool where returning False stops idle monitoring ​ timeout float required The number of seconds to wait before considering the user idle. ​ Behavior The processor starts monitoring for inactivity only after the first conversation activity (either UserStartedSpeakingFrame or BotSpeakingFrame ). It manages idle state based on the following rules: Resets idle timer when user starts or stops speaking Pauses idle monitoring while user is speaking Resets idle timer when bot is speaking Stops monitoring on conversation end or cancellation Manages a retry count for the retry callback Stops monitoring when retry callback returns False ​ Properties ​ retry_count int The current number of retry attempts made to engage the user. ​ Example Implementations Here are two example showing how to use the UserIdleProcessor : one with the basic callback and one with the retry callback: Basic Callback Retry Callback Copy Ask AI from pipecat.frames.frames import LLMMessagesFrame from pipecat.pipeline.pipeline import Pipeline from pipecat.processors.user_idle_processor import UserIdleProcessor async def handle_idle ( user_idle : UserIdleProcessor) -> None : messages.append({ "role" : "system" , "content" : "Ask the user if they are still there and try to prompt for some input." }) await user_idle.push_frame(LLMMessagesFrame(messages)) # Create the processor user_idle = UserIdleProcessor( callback = handle_idle, timeout = 5.0 ) # Add to pipeline pipeline = Pipeline([ transport.input(), user_idle, # Add the processor to monitor user activity context_aggregator.user(), # ... rest of pipeline ]) Copy Ask AI from pipecat.frames.frames import LLMMessagesFrame from pipecat.pipeline.pipeline import Pipeline from pipecat.processors.user_idle_processor import UserIdleProcessor async def handle_idle ( user_idle : UserIdleProcessor) -> None : messages.append({ "role" : "system" , "content" : "Ask the user if they are still there and try to prompt for some input." }) await user_idle.push_frame(LLMMessagesFrame(messages)) # Create the processor user_idle = UserIdleProcessor( callback = handle_idle, timeout = 5.0 ) # Add to pipeline pipeline = Pipeline([ transport.input(), user_idle, # Add the processor to monitor user activity context_aggregator.user(), # ... rest of pipeline ]) Copy Ask AI from pipecat.frames.frames import EndFrame, LLMMessagesFrame, TTSSpeakFrame from pipecat.pipeline.pipeline import Pipeline from pipecat.processors.user_idle_processor import UserIdleProcessor async def handle_user_idle ( user_idle : UserIdleProcessor, retry_count : int ) -> bool : if retry_count == 1 : # First attempt: Gentle reminder messages.append({ "role" : "system" , "content" : "The user has been quiet. Politely and briefly ask if they're still there." }) await user_idle.push_frame(LLMMessagesFrame(messages)) return True elif retry_count == 2 : # Second attempt: Direct prompt messages.append({ "role" : "system" , "content" : "The user is still inactive. Ask if they'd like to continue our conversation." }) await user_idle.push_frame(LLMMessagesFrame(messages)) return True else : # Third attempt: End conversation await user_idle.push_frame( TTSSpeakFrame( "It seems like you're busy right now. Have a nice day!" ) ) await task.queue_frame(EndFrame()) return False # Stop monitoring # Create the processor user_idle = UserIdleProcessor( callback = handle_user_idle, timeout = 5.0 ) # Add to pipeline pipeline = Pipeline([ transport.input(), user_idle, # Add the processor to monitor user activity context_aggregator.user(), # ... rest of pipeline ]) ​ Frame Handling The processor handles the following frame types: UserStartedSpeakingFrame : Marks user as active, resets idle timer and retry count UserStoppedSpeakingFrame : Starts idle monitoring BotSpeakingFrame : Resets idle timer EndFrame / CancelFrame : Stops idle monitoring ​ Notes The idle callback won’t be triggered while the user or bot is actively speaking The processor automatically cleans up its resources when the pipeline ends Basic callbacks are supported for backward compatibility Producer & Consumer Processors AudioBufferProcessor On this page Constructor Parameters Behavior Properties Example Implementations Frame Handling Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/video_simli_57b398a6.txt b/video_simli_57b398a6.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f85cf4749fed1d4dd578b81f11aa440259958d0d
--- /dev/null
+++ b/video_simli_57b398a6.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/video/simli#param-cancel-frame
+Title: Simli - Pipecat
+==================================================
+
+Simli - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Video Simli Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Simli Tavus Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SimliVideoService creates AI avatar video responses by converting audio input into synchronized video and audio output through Simli’s WebRTC platform. It handles real-time audio streaming, video generation, and automatic audio resampling. ​ Installation Install the required dependencies: Copy Ask AI pip install "pipecat-ai[simli]" ​ Configuration ​ Required Environment Variables Copy Ask AI SIMLI_API_KEY = your_api_key SIMLI_FACE_ID = your_face_id Get your API key and Face ID by signing up at Simli ​ Configuration Copy Ask AI SimliVideoService( SimliConfig( SIMLI_API_KEY , SIMLI_FACE_ID ), useTurnServer = False , latencyInterval = 60 ) ​ Constructor Parameters for SimliConfig ​ apiKey str required Your Simli API key. This key is required for authenticating API requests. ​ faceId str required The face identifier for Simli. This is used to associate API interactions with a specific face or persona. ​ syncAudio bool default: "True" Indicates whether to synchronize audio streams. Defaults to True . Set to False to disable audio synchronization. ​ handleSilence bool default: "True" Determines if silence in audio streams should be handled automatically. Defaults to True . ​ maxSessionLength int default: "600" The maximum length of a session in seconds. Defaults to 600 (10 minutes). ​ maxIdleTime int default: "30" The maximum idle time (in seconds) allowed during a session before it is automatically terminated. Defaults to 30 seconds. ​ Constructor Parameters for SimliVideoService ​ simli_config SimliConfig required The configuration object for Simli. This must be an instance of simli_config and contains essential settings such as API key, face ID, and other session-related configurations. ​ use_turn_server bool default: "False" Determines whether a TURN server should be used for establishing connections. Defaults to False . Set to True if your network requires TURN for WebRTC connections. ​ latency_interval int default: "0" Delay (in seconds) between ping calls to calculate latency between your machine and simli server. Set to 0 to disable. ​ Input Frames ​ Audio Input ​ TTSAudioRawFrame Frame Raw audio data for avatar speech ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ StartInterruptionFrame Frame Signals conversation interruption ​ EndFrame Frame Signals end of conversation ​ CancelFrame Frame Signals conversation cancellation ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.simli.video import SimliVideoService from simli import SimliConfig import os async def create_avatar_pipeline (): # Initialize Simli service simli = SimliVideoService( SimliConfig( api_key = os.getenv( "SIMLI_API_KEY" ), face_id = os.getenv( "SIMLI_FACE_ID" ) ) ) # Create pipeline with Simli pipeline = Pipeline([ transport.input(), # Your audio input llm, # Language model service tts, # Text-to-speech service simli, # Simli video generation transport.output(), # Your video output handler ]) return pipeline ​ Frame Flow ​ Metrics Support The service collects processing metrics: Processing duration Time to First Byte (TTFB) API response times Audio processing metrics ​ Common Use Cases AI Video Avatars Virtual assistants Customer service Educational content Interactive Presentations Product demonstrations Training materials Marketing content Real-time Communication Video conferencing Virtual meetings Interactive broadcasts ​ Notes Handles real-time audio streaming Supports conversation interruptions Manages conversation lifecycle Automatic audio resampling Thread-safe processing WebRTC integration through Daily Includes comprehensive error handling OpenAI Tavus On this page Overview Installation Configuration Required Environment Variables Configuration Constructor Parameters for SimliConfig Constructor Parameters for SimliVideoService Input Frames Audio Input Control Frames Usage Example Frame Flow Metrics Support Common Use Cases Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/video_simli_7972c380.txt b/video_simli_7972c380.txt
new file mode 100644
index 0000000000000000000000000000000000000000..eccf4137911c179de5d82358546065bb5ff2fe3f
--- /dev/null
+++ b/video_simli_7972c380.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/video/simli#param-tts-started-frame
+Title: Simli - Pipecat
+==================================================
+
+Simli - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Video Simli Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Simli Tavus Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SimliVideoService creates AI avatar video responses by converting audio input into synchronized video and audio output through Simli’s WebRTC platform. It handles real-time audio streaming, video generation, and automatic audio resampling. ​ Installation Install the required dependencies: Copy Ask AI pip install "pipecat-ai[simli]" ​ Configuration ​ Required Environment Variables Copy Ask AI SIMLI_API_KEY = your_api_key SIMLI_FACE_ID = your_face_id Get your API key and Face ID by signing up at Simli ​ Configuration Copy Ask AI SimliVideoService( SimliConfig( SIMLI_API_KEY , SIMLI_FACE_ID ), useTurnServer = False , latencyInterval = 60 ) ​ Constructor Parameters for SimliConfig ​ apiKey str required Your Simli API key. This key is required for authenticating API requests. ​ faceId str required The face identifier for Simli. This is used to associate API interactions with a specific face or persona. ​ syncAudio bool default: "True" Indicates whether to synchronize audio streams. Defaults to True . Set to False to disable audio synchronization. ​ handleSilence bool default: "True" Determines if silence in audio streams should be handled automatically. Defaults to True . ​ maxSessionLength int default: "600" The maximum length of a session in seconds. Defaults to 600 (10 minutes). ​ maxIdleTime int default: "30" The maximum idle time (in seconds) allowed during a session before it is automatically terminated. Defaults to 30 seconds. ​ Constructor Parameters for SimliVideoService ​ simli_config SimliConfig required The configuration object for Simli. This must be an instance of simli_config and contains essential settings such as API key, face ID, and other session-related configurations. ​ use_turn_server bool default: "False" Determines whether a TURN server should be used for establishing connections. Defaults to False . Set to True if your network requires TURN for WebRTC connections. ​ latency_interval int default: "0" Delay (in seconds) between ping calls to calculate latency between your machine and simli server. Set to 0 to disable. ​ Input Frames ​ Audio Input ​ TTSAudioRawFrame Frame Raw audio data for avatar speech ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ StartInterruptionFrame Frame Signals conversation interruption ​ EndFrame Frame Signals end of conversation ​ CancelFrame Frame Signals conversation cancellation ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.simli.video import SimliVideoService from simli import SimliConfig import os async def create_avatar_pipeline (): # Initialize Simli service simli = SimliVideoService( SimliConfig( api_key = os.getenv( "SIMLI_API_KEY" ), face_id = os.getenv( "SIMLI_FACE_ID" ) ) ) # Create pipeline with Simli pipeline = Pipeline([ transport.input(), # Your audio input llm, # Language model service tts, # Text-to-speech service simli, # Simli video generation transport.output(), # Your video output handler ]) return pipeline ​ Frame Flow ​ Metrics Support The service collects processing metrics: Processing duration Time to First Byte (TTFB) API response times Audio processing metrics ​ Common Use Cases AI Video Avatars Virtual assistants Customer service Educational content Interactive Presentations Product demonstrations Training materials Marketing content Real-time Communication Video conferencing Virtual meetings Interactive broadcasts ​ Notes Handles real-time audio streaming Supports conversation interruptions Manages conversation lifecycle Automatic audio resampling Thread-safe processing WebRTC integration through Daily Includes comprehensive error handling OpenAI Tavus On this page Overview Installation Configuration Required Environment Variables Configuration Constructor Parameters for SimliConfig Constructor Parameters for SimliVideoService Input Frames Audio Input Control Frames Usage Example Frame Flow Metrics Support Common Use Cases Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/video_simli_ada80e09.txt b/video_simli_ada80e09.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8e0a5ce4e2544822ec0407bc9023e3a7ee2a5d23
--- /dev/null
+++ b/video_simli_ada80e09.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/video/simli#control-frames
+Title: Simli - Pipecat
+==================================================
+
+Simli - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Video Simli Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Simli Tavus Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SimliVideoService creates AI avatar video responses by converting audio input into synchronized video and audio output through Simli’s WebRTC platform. It handles real-time audio streaming, video generation, and automatic audio resampling. ​ Installation Install the required dependencies: Copy Ask AI pip install "pipecat-ai[simli]" ​ Configuration ​ Required Environment Variables Copy Ask AI SIMLI_API_KEY = your_api_key SIMLI_FACE_ID = your_face_id Get your API key and Face ID by signing up at Simli ​ Configuration Copy Ask AI SimliVideoService( SimliConfig( SIMLI_API_KEY , SIMLI_FACE_ID ), useTurnServer = False , latencyInterval = 60 ) ​ Constructor Parameters for SimliConfig ​ apiKey str required Your Simli API key. This key is required for authenticating API requests. ​ faceId str required The face identifier for Simli. This is used to associate API interactions with a specific face or persona. ​ syncAudio bool default: "True" Indicates whether to synchronize audio streams. Defaults to True . Set to False to disable audio synchronization. ​ handleSilence bool default: "True" Determines if silence in audio streams should be handled automatically. Defaults to True . ​ maxSessionLength int default: "600" The maximum length of a session in seconds. Defaults to 600 (10 minutes). ​ maxIdleTime int default: "30" The maximum idle time (in seconds) allowed during a session before it is automatically terminated. Defaults to 30 seconds. ​ Constructor Parameters for SimliVideoService ​ simli_config SimliConfig required The configuration object for Simli. This must be an instance of simli_config and contains essential settings such as API key, face ID, and other session-related configurations. ​ use_turn_server bool default: "False" Determines whether a TURN server should be used for establishing connections. Defaults to False . Set to True if your network requires TURN for WebRTC connections. ​ latency_interval int default: "0" Delay (in seconds) between ping calls to calculate latency between your machine and simli server. Set to 0 to disable. ​ Input Frames ​ Audio Input ​ TTSAudioRawFrame Frame Raw audio data for avatar speech ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ StartInterruptionFrame Frame Signals conversation interruption ​ EndFrame Frame Signals end of conversation ​ CancelFrame Frame Signals conversation cancellation ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.simli.video import SimliVideoService from simli import SimliConfig import os async def create_avatar_pipeline (): # Initialize Simli service simli = SimliVideoService( SimliConfig( api_key = os.getenv( "SIMLI_API_KEY" ), face_id = os.getenv( "SIMLI_FACE_ID" ) ) ) # Create pipeline with Simli pipeline = Pipeline([ transport.input(), # Your audio input llm, # Language model service tts, # Text-to-speech service simli, # Simli video generation transport.output(), # Your video output handler ]) return pipeline ​ Frame Flow ​ Metrics Support The service collects processing metrics: Processing duration Time to First Byte (TTFB) API response times Audio processing metrics ​ Common Use Cases AI Video Avatars Virtual assistants Customer service Educational content Interactive Presentations Product demonstrations Training materials Marketing content Real-time Communication Video conferencing Virtual meetings Interactive broadcasts ​ Notes Handles real-time audio streaming Supports conversation interruptions Manages conversation lifecycle Automatic audio resampling Thread-safe processing WebRTC integration through Daily Includes comprehensive error handling OpenAI Tavus On this page Overview Installation Configuration Required Environment Variables Configuration Constructor Parameters for SimliConfig Constructor Parameters for SimliVideoService Input Frames Audio Input Control Frames Usage Example Frame Flow Metrics Support Common Use Cases Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/video_simli_c8af275b.txt b/video_simli_c8af275b.txt
new file mode 100644
index 0000000000000000000000000000000000000000..aeea96cc244e4fa2101c71bc7ab6a8fe1dac8e3d
--- /dev/null
+++ b/video_simli_c8af275b.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/video/simli#constructor-parameters-for-simlivideoservice
+Title: Simli - Pipecat
+==================================================
+
+Simli - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Video Simli Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Simli Tavus Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SimliVideoService creates AI avatar video responses by converting audio input into synchronized video and audio output through Simli’s WebRTC platform. It handles real-time audio streaming, video generation, and automatic audio resampling. ​ Installation Install the required dependencies: Copy Ask AI pip install "pipecat-ai[simli]" ​ Configuration ​ Required Environment Variables Copy Ask AI SIMLI_API_KEY = your_api_key SIMLI_FACE_ID = your_face_id Get your API key and Face ID by signing up at Simli ​ Configuration Copy Ask AI SimliVideoService( SimliConfig( SIMLI_API_KEY , SIMLI_FACE_ID ), useTurnServer = False , latencyInterval = 60 ) ​ Constructor Parameters for SimliConfig ​ apiKey str required Your Simli API key. This key is required for authenticating API requests. ​ faceId str required The face identifier for Simli. This is used to associate API interactions with a specific face or persona. ​ syncAudio bool default: "True" Indicates whether to synchronize audio streams. Defaults to True . Set to False to disable audio synchronization. ​ handleSilence bool default: "True" Determines if silence in audio streams should be handled automatically. Defaults to True . ​ maxSessionLength int default: "600" The maximum length of a session in seconds. Defaults to 600 (10 minutes). ​ maxIdleTime int default: "30" The maximum idle time (in seconds) allowed during a session before it is automatically terminated. Defaults to 30 seconds. ​ Constructor Parameters for SimliVideoService ​ simli_config SimliConfig required The configuration object for Simli. This must be an instance of simli_config and contains essential settings such as API key, face ID, and other session-related configurations. ​ use_turn_server bool default: "False" Determines whether a TURN server should be used for establishing connections. Defaults to False . Set to True if your network requires TURN for WebRTC connections. ​ latency_interval int default: "0" Delay (in seconds) between ping calls to calculate latency between your machine and simli server. Set to 0 to disable. ​ Input Frames ​ Audio Input ​ TTSAudioRawFrame Frame Raw audio data for avatar speech ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ StartInterruptionFrame Frame Signals conversation interruption ​ EndFrame Frame Signals end of conversation ​ CancelFrame Frame Signals conversation cancellation ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.simli.video import SimliVideoService from simli import SimliConfig import os async def create_avatar_pipeline (): # Initialize Simli service simli = SimliVideoService( SimliConfig( api_key = os.getenv( "SIMLI_API_KEY" ), face_id = os.getenv( "SIMLI_FACE_ID" ) ) ) # Create pipeline with Simli pipeline = Pipeline([ transport.input(), # Your audio input llm, # Language model service tts, # Text-to-speech service simli, # Simli video generation transport.output(), # Your video output handler ]) return pipeline ​ Frame Flow ​ Metrics Support The service collects processing metrics: Processing duration Time to First Byte (TTFB) API response times Audio processing metrics ​ Common Use Cases AI Video Avatars Virtual assistants Customer service Educational content Interactive Presentations Product demonstrations Training materials Marketing content Real-time Communication Video conferencing Virtual meetings Interactive broadcasts ​ Notes Handles real-time audio streaming Supports conversation interruptions Manages conversation lifecycle Automatic audio resampling Thread-safe processing WebRTC integration through Daily Includes comprehensive error handling OpenAI Tavus On this page Overview Installation Configuration Required Environment Variables Configuration Constructor Parameters for SimliConfig Constructor Parameters for SimliVideoService Input Frames Audio Input Control Frames Usage Example Frame Flow Metrics Support Common Use Cases Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/video_simli_c9c8e519.txt b/video_simli_c9c8e519.txt
new file mode 100644
index 0000000000000000000000000000000000000000..980d75fee10803da8d02ba264f828bb4fb75d0e8
--- /dev/null
+++ b/video_simli_c9c8e519.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/video/simli#param-tts-audio-raw-frame
+Title: Simli - Pipecat
+==================================================
+
+Simli - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Video Simli Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Simli Tavus Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview SimliVideoService creates AI avatar video responses by converting audio input into synchronized video and audio output through Simli’s WebRTC platform. It handles real-time audio streaming, video generation, and automatic audio resampling. ​ Installation Install the required dependencies: Copy Ask AI pip install "pipecat-ai[simli]" ​ Configuration ​ Required Environment Variables Copy Ask AI SIMLI_API_KEY = your_api_key SIMLI_FACE_ID = your_face_id Get your API key and Face ID by signing up at Simli ​ Configuration Copy Ask AI SimliVideoService( SimliConfig( SIMLI_API_KEY , SIMLI_FACE_ID ), useTurnServer = False , latencyInterval = 60 ) ​ Constructor Parameters for SimliConfig ​ apiKey str required Your Simli API key. This key is required for authenticating API requests. ​ faceId str required The face identifier for Simli. This is used to associate API interactions with a specific face or persona. ​ syncAudio bool default: "True" Indicates whether to synchronize audio streams. Defaults to True . Set to False to disable audio synchronization. ​ handleSilence bool default: "True" Determines if silence in audio streams should be handled automatically. Defaults to True . ​ maxSessionLength int default: "600" The maximum length of a session in seconds. Defaults to 600 (10 minutes). ​ maxIdleTime int default: "30" The maximum idle time (in seconds) allowed during a session before it is automatically terminated. Defaults to 30 seconds. ​ Constructor Parameters for SimliVideoService ​ simli_config SimliConfig required The configuration object for Simli. This must be an instance of simli_config and contains essential settings such as API key, face ID, and other session-related configurations. ​ use_turn_server bool default: "False" Determines whether a TURN server should be used for establishing connections. Defaults to False . Set to True if your network requires TURN for WebRTC connections. ​ latency_interval int default: "0" Delay (in seconds) between ping calls to calculate latency between your machine and simli server. Set to 0 to disable. ​ Input Frames ​ Audio Input ​ TTSAudioRawFrame Frame Raw audio data for avatar speech ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ StartInterruptionFrame Frame Signals conversation interruption ​ EndFrame Frame Signals end of conversation ​ CancelFrame Frame Signals conversation cancellation ​ Usage Example Copy Ask AI from pipecat.pipeline.pipeline import Pipeline from pipecat.services.simli.video import SimliVideoService from simli import SimliConfig import os async def create_avatar_pipeline (): # Initialize Simli service simli = SimliVideoService( SimliConfig( api_key = os.getenv( "SIMLI_API_KEY" ), face_id = os.getenv( "SIMLI_FACE_ID" ) ) ) # Create pipeline with Simli pipeline = Pipeline([ transport.input(), # Your audio input llm, # Language model service tts, # Text-to-speech service simli, # Simli video generation transport.output(), # Your video output handler ]) return pipeline ​ Frame Flow ​ Metrics Support The service collects processing metrics: Processing duration Time to First Byte (TTFB) API response times Audio processing metrics ​ Common Use Cases AI Video Avatars Virtual assistants Customer service Educational content Interactive Presentations Product demonstrations Training materials Marketing content Real-time Communication Video conferencing Virtual meetings Interactive broadcasts ​ Notes Handles real-time audio streaming Supports conversation interruptions Manages conversation lifecycle Automatic audio resampling Thread-safe processing WebRTC integration through Daily Includes comprehensive error handling OpenAI Tavus On this page Overview Installation Configuration Required Environment Variables Configuration Constructor Parameters for SimliConfig Constructor Parameters for SimliVideoService Input Frames Audio Input Control Frames Usage Example Frame Flow Metrics Support Common Use Cases Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/video_tavus_1e89f04a.txt b/video_tavus_1e89f04a.txt
new file mode 100644
index 0000000000000000000000000000000000000000..c861de87b5178094fdd591deb713edee25471352
--- /dev/null
+++ b/video_tavus_1e89f04a.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/video/tavus#notes
+Title: Tavus - Pipecat
+==================================================
+
+Tavus - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Video Tavus Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Simli Tavus Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview TavusVideoService enables the creation of AI avatar video responses by sending audio to Tavus’s API. It handles real-time audio streaming, conversation management, and video generation through Tavus’s platform. ​ Installation To use TavusVideoService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[tavus]" You’ll need to set up the following environment variables: TAVUS_API_KEY - Your Tavus API key TAVUS_REPLICA_ID - Your Tavus replica identifier You can obtain a Tavus API key by signing up at Tavus . ​ Configuration ​ Constructor Parameters ​ api_key str required Your Tavus API key ​ replica_id str required Tavus replica identifier ​ persona_id str default: "pipecat0" Tavus persona identifier The persona ID is optional and defaults to pipecat0 . To use the LLM output and TTS voice, do not set the persona_id. Instead, leave it set to the default value, pipecat0 . ​ session aiohttp.ClientSession required HTTP client session for API communication ​ Input Frames ​ Audio Input ​ TTSAudioRawFrame Frame Raw audio data for avatar speech ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ StartInterruptionFrame Frame Signals conversation interruption ​ EndFrame Frame Signals end of conversation ​ CancelFrame Frame Signals conversation cancellation ​ Usage Example Copy Ask AI from pipecat.services.tavus.video import TavusVideoService import aiohttp async def main (): async with aiohttp.ClientSession() as session: # Configure service tavus = TavusVideoService( api_key = "your-tavus-api-key" , replica_id = "your-replica-id" , session = session ) # Initialize conversation room_url = await tavus.initialize() # Get persona name persona_name = await tavus.get_persona_name() transport = DailyTransport( room_url = room_url, # Your Tavus room URL token = None , bot_name = "Pipecat bot" , params = DailyParams( audio_in_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) # Use in pipeline pipeline = Pipeline( [ transport.input(), # Transport user input stt, # STT context_aggregator.user(), # User responses llm, # LLM tts, # TTS tavus, # Tavus output layer transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) ​ API Methods ​ initialize Copy Ask AI async def initialize ( self ) -> str : Initializes a new conversation and returns the conversation URL. ​ get_persona_name Copy Ask AI async def get_persona_name ( self ) -> str : Retrieves the name of the configured persona. ​ Frame Flow ​ Metrics Support The service collects processing metrics: Processing duration Time to First Byte (TTFB) API response times Audio processing metrics ​ Common Use Cases AI Video Avatars Virtual assistants Customer service Educational content Interactive Presentations Product demonstrations Training materials Marketing content Real-time Communication Video conferencing Virtual meetings Interactive broadcasts ​ Notes Handles real-time audio streaming Supports conversation interruptions Manages conversation lifecycle Automatic audio resampling Thread-safe processing WebRTC integration through Daily Includes comprehensive error handling Simli Mem0 On this page Overview Installation Configuration Constructor Parameters Input Frames Audio Input Control Frames Usage Example API Methods initialize get_persona_name Frame Flow Metrics Support Common Use Cases Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/video_tavus_4da95047.txt b/video_tavus_4da95047.txt
new file mode 100644
index 0000000000000000000000000000000000000000..81cdce9d90a623c92678da4d45e5d2f8acab23dd
--- /dev/null
+++ b/video_tavus_4da95047.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/video/tavus#frame-flow
+Title: Tavus - Pipecat
+==================================================
+
+Tavus - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Video Tavus Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Simli Tavus Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview TavusVideoService enables the creation of AI avatar video responses by sending audio to Tavus’s API. It handles real-time audio streaming, conversation management, and video generation through Tavus’s platform. ​ Installation To use TavusVideoService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[tavus]" You’ll need to set up the following environment variables: TAVUS_API_KEY - Your Tavus API key TAVUS_REPLICA_ID - Your Tavus replica identifier You can obtain a Tavus API key by signing up at Tavus . ​ Configuration ​ Constructor Parameters ​ api_key str required Your Tavus API key ​ replica_id str required Tavus replica identifier ​ persona_id str default: "pipecat0" Tavus persona identifier The persona ID is optional and defaults to pipecat0 . To use the LLM output and TTS voice, do not set the persona_id. Instead, leave it set to the default value, pipecat0 . ​ session aiohttp.ClientSession required HTTP client session for API communication ​ Input Frames ​ Audio Input ​ TTSAudioRawFrame Frame Raw audio data for avatar speech ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ StartInterruptionFrame Frame Signals conversation interruption ​ EndFrame Frame Signals end of conversation ​ CancelFrame Frame Signals conversation cancellation ​ Usage Example Copy Ask AI from pipecat.services.tavus.video import TavusVideoService import aiohttp async def main (): async with aiohttp.ClientSession() as session: # Configure service tavus = TavusVideoService( api_key = "your-tavus-api-key" , replica_id = "your-replica-id" , session = session ) # Initialize conversation room_url = await tavus.initialize() # Get persona name persona_name = await tavus.get_persona_name() transport = DailyTransport( room_url = room_url, # Your Tavus room URL token = None , bot_name = "Pipecat bot" , params = DailyParams( audio_in_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) # Use in pipeline pipeline = Pipeline( [ transport.input(), # Transport user input stt, # STT context_aggregator.user(), # User responses llm, # LLM tts, # TTS tavus, # Tavus output layer transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) ​ API Methods ​ initialize Copy Ask AI async def initialize ( self ) -> str : Initializes a new conversation and returns the conversation URL. ​ get_persona_name Copy Ask AI async def get_persona_name ( self ) -> str : Retrieves the name of the configured persona. ​ Frame Flow ​ Metrics Support The service collects processing metrics: Processing duration Time to First Byte (TTFB) API response times Audio processing metrics ​ Common Use Cases AI Video Avatars Virtual assistants Customer service Educational content Interactive Presentations Product demonstrations Training materials Marketing content Real-time Communication Video conferencing Virtual meetings Interactive broadcasts ​ Notes Handles real-time audio streaming Supports conversation interruptions Manages conversation lifecycle Automatic audio resampling Thread-safe processing WebRTC integration through Daily Includes comprehensive error handling Simli Mem0 On this page Overview Installation Configuration Constructor Parameters Input Frames Audio Input Control Frames Usage Example API Methods initialize get_persona_name Frame Flow Metrics Support Common Use Cases Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file
diff --git a/video_tavus_582715a2.txt b/video_tavus_582715a2.txt
new file mode 100644
index 0000000000000000000000000000000000000000..48f6223e0a4425a9dd068c59154208468556b439
--- /dev/null
+++ b/video_tavus_582715a2.txt
@@ -0,0 +1,5 @@
+URL: https://docs.pipecat.ai/server/services/video/tavus#usage-example
+Title: Tavus - Pipecat
+==================================================
+
+Tavus - Pipecat Pipecat home page Search... ⌘ K Ask AI Search... Navigation Video Tavus Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Simli Tavus Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline ​ Overview TavusVideoService enables the creation of AI avatar video responses by sending audio to Tavus’s API. It handles real-time audio streaming, conversation management, and video generation through Tavus’s platform. ​ Installation To use TavusVideoService , install the required dependencies: Copy Ask AI pip install "pipecat-ai[tavus]" You’ll need to set up the following environment variables: TAVUS_API_KEY - Your Tavus API key TAVUS_REPLICA_ID - Your Tavus replica identifier You can obtain a Tavus API key by signing up at Tavus . ​ Configuration ​ Constructor Parameters ​ api_key str required Your Tavus API key ​ replica_id str required Tavus replica identifier ​ persona_id str default: "pipecat0" Tavus persona identifier The persona ID is optional and defaults to pipecat0 . To use the LLM output and TTS voice, do not set the persona_id. Instead, leave it set to the default value, pipecat0 . ​ session aiohttp.ClientSession required HTTP client session for API communication ​ Input Frames ​ Audio Input ​ TTSAudioRawFrame Frame Raw audio data for avatar speech ​ Control Frames ​ TTSStartedFrame Frame Signals start of speech synthesis ​ TTSStoppedFrame Frame Signals end of speech synthesis ​ StartInterruptionFrame Frame Signals conversation interruption ​ EndFrame Frame Signals end of conversation ​ CancelFrame Frame Signals conversation cancellation ​ Usage Example Copy Ask AI from pipecat.services.tavus.video import TavusVideoService import aiohttp async def main (): async with aiohttp.ClientSession() as session: # Configure service tavus = TavusVideoService( api_key = "your-tavus-api-key" , replica_id = "your-replica-id" , session = session ) # Initialize conversation room_url = await tavus.initialize() # Get persona name persona_name = await tavus.get_persona_name() transport = DailyTransport( room_url = room_url, # Your Tavus room URL token = None , bot_name = "Pipecat bot" , params = DailyParams( audio_in_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) # Use in pipeline pipeline = Pipeline( [ transport.input(), # Transport user input stt, # STT context_aggregator.user(), # User responses llm, # LLM tts, # TTS tavus, # Tavus output layer transport.output(), # Transport bot output context_aggregator.assistant(), # Assistant spoken responses ] ) ​ API Methods ​ initialize Copy Ask AI async def initialize ( self ) -> str : Initializes a new conversation and returns the conversation URL. ​ get_persona_name Copy Ask AI async def get_persona_name ( self ) -> str : Retrieves the name of the configured persona. ​ Frame Flow ​ Metrics Support The service collects processing metrics: Processing duration Time to First Byte (TTFB) API response times Audio processing metrics ​ Common Use Cases AI Video Avatars Virtual assistants Customer service Educational content Interactive Presentations Product demonstrations Training materials Marketing content Real-time Communication Video conferencing Virtual meetings Interactive broadcasts ​ Notes Handles real-time audio streaming Supports conversation interruptions Manages conversation lifecycle Automatic audio resampling Thread-safe processing WebRTC integration through Daily Includes comprehensive error handling Simli Mem0 On this page Overview Installation Configuration Constructor Parameters Input Frames Audio Input Control Frames Usage Example API Methods initialize get_persona_name Frame Flow Metrics Support Common Use Cases Notes Assistant Responses are generated using AI and may contain mistakes.
\ No newline at end of file