Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes. Β
See raw diff
- _sources_indexrsttxt_c33fce45.txt +5 -0
- analytics_sentry_20a4ab88.txt +5 -0
- analytics_sentry_45cc1930.txt +5 -0
- analytics_sentry_6c729748.txt +5 -0
- audio_audio-buffer-processor_2d35df28.txt +5 -0
- audio_audio-buffer-processor_35423de9.txt +5 -0
- audio_audio-buffer-processor_40dcd59a.txt +5 -0
- audio_audio-buffer-processor_41cb801d.txt +5 -0
- audio_audio-buffer-processor_9b72446e.txt +5 -0
- audio_audio-buffer-processor_d66f73d3.txt +5 -0
- audio_koala-filter_20ce4d60.txt +5 -0
- audio_koala-filter_e4d8a296.txt +5 -0
- audio_krisp-filter_39dfde86.txt +5 -0
- audio_krisp-filter_8376d300.txt +5 -0
- audio_krisp-filter_b89f262b.txt +5 -0
- audio_krisp-filter_ddfb5acd.txt +5 -0
- audio_noisereduce-filter_1e57a17d.txt +5 -0
- audio_noisereduce-filter_252ff8b7.txt +5 -0
- audio_silero-vad-analyzer_663f6c6f.txt +5 -0
- audio_silero-vad-analyzer_7a9c8d73.txt +5 -0
- audio_silero-vad-analyzer_90a6d9cd.txt +5 -0
- audio_silero-vad-analyzer_edb34478.txt +5 -0
- audio_soundfile-mixer_3a5c086e.txt +5 -0
- audio_soundfile-mixer_4e4b1cf9.txt +5 -0
- audio_soundfile-mixer_78c1ff2d.txt +5 -0
- base-classes_media_0c2e6a18.txt +5 -0
- base-classes_media_adb613bc.txt +5 -0
- base-classes_speech_0ac5790e.txt +5 -0
- base-classes_speech_67e5e89f.txt +5 -0
- base-classes_speech_7200e74e.txt +5 -0
- base-classes_speech_8cd29105.txt +5 -0
- base-classes_text_d8b7bcb1.txt +5 -0
- c_introduction_2d027ecf.txt +5 -0
- c_introduction_ea0aa8d6.txt +5 -0
- c_transport_bd70965d.txt +5 -0
- client_introduction_394dd56e.txt +5 -0
- client_introduction_769681ac.txt +5 -0
- client_migration-guide_cf5b62a9.txt +5 -0
- client_rtvi-standard_30d7d570.txt +5 -0
- client_rtvi-standard_63b2ad42.txt +5 -0
- client_rtvi-standard_7786a985.txt +5 -0
- client_rtvi-standard_d5f72539.txt +5 -0
- client_rtvi-standard_d8e92ae0.txt +5 -0
- client_rtvi-standard_ef4532ad.txt +5 -0
- daily_rest-helpers_053a92dd.txt +5 -0
- daily_rest-helpers_19bd418b.txt +5 -0
- daily_rest-helpers_33ab0889.txt +5 -0
- daily_rest-helpers_41f93594.txt +5 -0
- daily_rest-helpers_5ded7034.txt +5 -0
- daily_rest-helpers_6423975d.txt +5 -0
_sources_indexrsttxt_c33fce45.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/links/_sources/index.rst.txt#real-time-processing
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
analytics_sentry_20a4ab88.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/analytics/sentry#overview
|
2 |
+
Title: Sentry Metrics - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Sentry Metrics - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Analytics & Monitoring Sentry Metrics Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Sentry Metrics Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SentryMetrics extends FrameProcessorMetrics to provide performance monitoring integration with Sentry. It tracks Time to First Byte (TTFB) and processing duration metrics for frame processors. β Installation To use Sentry metrics, install the Sentry SDK: Copy Ask AI pip install "pipecat-ai[sentry]" β Configuration Sentry must be initialized in your application before metrics will be collected: Copy Ask AI import sentry_sdk sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) β Usage Example Copy Ask AI import sentry_sdk from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.metrics.sentry import SentryMetrics from pipecat.transports.services.daily import DailyParams, DailyTransport async def create_metrics_pipeline (): sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_out_enabled = True , audio_in_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ), ) tts = ElevenLabsTTSService( api_key = os.getenv( "ELEVENLABS_API_KEY" ), metrics = SentryMetrics(), ) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" ), model = "gpt-4o" ), metrics = SentryMetrics(), ) messages = [ { "role" : "system" , "content" : "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself. Keep all your responses to 12 words or fewer." , }, ] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) β Transaction Information Each transaction includes: Operation type ( ttfb or processing ) Description with processor name Start timestamp End timestamp Unique transaction ID β Fallback Behavior If Sentry is not available (not installed or not initialized): Warning logs are generated Metric methods execute without error No data is sent to Sentry β Notes Requires Sentry SDK to be installed and initialized Thread-safe metric collection Automatic transaction management Supports selective TTFB reporting Integrates with Sentryβs performance monitoring Provides detailed timing information Maintains timing data even when Sentry is unavailable Moondream Producer & Consumer Processors On this page Overview Installation Configuration Usage Example Transaction Information Fallback Behavior Notes Assistant Responses are generated using AI and may contain mistakes.
|
analytics_sentry_45cc1930.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/analytics/sentry#fallback-behavior
|
2 |
+
Title: Sentry Metrics - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Sentry Metrics - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Analytics & Monitoring Sentry Metrics Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Sentry Metrics Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SentryMetrics extends FrameProcessorMetrics to provide performance monitoring integration with Sentry. It tracks Time to First Byte (TTFB) and processing duration metrics for frame processors. β Installation To use Sentry metrics, install the Sentry SDK: Copy Ask AI pip install "pipecat-ai[sentry]" β Configuration Sentry must be initialized in your application before metrics will be collected: Copy Ask AI import sentry_sdk sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) β Usage Example Copy Ask AI import sentry_sdk from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.metrics.sentry import SentryMetrics from pipecat.transports.services.daily import DailyParams, DailyTransport async def create_metrics_pipeline (): sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_out_enabled = True , audio_in_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ), ) tts = ElevenLabsTTSService( api_key = os.getenv( "ELEVENLABS_API_KEY" ), metrics = SentryMetrics(), ) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" ), model = "gpt-4o" ), metrics = SentryMetrics(), ) messages = [ { "role" : "system" , "content" : "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself. Keep all your responses to 12 words or fewer." , }, ] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) β Transaction Information Each transaction includes: Operation type ( ttfb or processing ) Description with processor name Start timestamp End timestamp Unique transaction ID β Fallback Behavior If Sentry is not available (not installed or not initialized): Warning logs are generated Metric methods execute without error No data is sent to Sentry β Notes Requires Sentry SDK to be installed and initialized Thread-safe metric collection Automatic transaction management Supports selective TTFB reporting Integrates with Sentryβs performance monitoring Provides detailed timing information Maintains timing data even when Sentry is unavailable Moondream Producer & Consumer Processors On this page Overview Installation Configuration Usage Example Transaction Information Fallback Behavior Notes Assistant Responses are generated using AI and may contain mistakes.
|
analytics_sentry_6c729748.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/services/analytics/sentry#transaction-information
|
2 |
+
Title: Sentry Metrics - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Sentry Metrics - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Analytics & Monitoring Sentry Metrics Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Sentry Metrics Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SentryMetrics extends FrameProcessorMetrics to provide performance monitoring integration with Sentry. It tracks Time to First Byte (TTFB) and processing duration metrics for frame processors. β Installation To use Sentry metrics, install the Sentry SDK: Copy Ask AI pip install "pipecat-ai[sentry]" β Configuration Sentry must be initialized in your application before metrics will be collected: Copy Ask AI import sentry_sdk sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) β Usage Example Copy Ask AI import sentry_sdk from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.elevenlabs.tts import ElevenLabsTTSService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.processors.metrics.sentry import SentryMetrics from pipecat.transports.services.daily import DailyParams, DailyTransport async def create_metrics_pipeline (): sentry_sdk.init( dsn = "your-sentry-dsn" , traces_sample_rate = 1.0 , ) transport = DailyTransport( room_url, token, "Chatbot" , DailyParams( audio_out_enabled = True , audio_in_enabled = True , video_out_enabled = False , vad_analyzer = SileroVADAnalyzer(), transcription_enabled = True , ), ) tts = ElevenLabsTTSService( api_key = os.getenv( "ELEVENLABS_API_KEY" ), metrics = SentryMetrics(), ) llm = OpenAILLMService( api_key = os.getenv( "OPENAI_API_KEY" ), model = "gpt-4o" ), metrics = SentryMetrics(), ) messages = [ { "role" : "system" , "content" : "You are Chatbot, a friendly, helpful robot. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way, but keep your responses brief. Start by introducing yourself. Keep all your responses to 12 words or fewer." , }, ] context = OpenAILLMContext(messages) context_aggregator = llm.create_context_aggregator(context) # Use in pipeline pipeline = Pipeline([ transport.input(), context_aggregator.user(), llm, tts, transport.output(), context_aggregator.assistant(), ]) β Transaction Information Each transaction includes: Operation type ( ttfb or processing ) Description with processor name Start timestamp End timestamp Unique transaction ID β Fallback Behavior If Sentry is not available (not installed or not initialized): Warning logs are generated Metric methods execute without error No data is sent to Sentry β Notes Requires Sentry SDK to be installed and initialized Thread-safe metric collection Automatic transaction management Supports selective TTFB reporting Integrates with Sentryβs performance monitoring Provides detailed timing information Maintains timing data even when Sentry is unavailable Moondream Producer & Consumer Processors On this page Overview Installation Configuration Usage Example Transaction Information Fallback Behavior Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_audio-buffer-processor_2d35df28.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#sample-rate
|
2 |
+
Title: AudioBufferProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AudioBufferProcessor - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. β Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) β Parameters β sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transportβs sample rate from the StartFrame . β num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) β buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) β enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). β Properties β sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. β num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). β Methods β start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. β stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. β has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. β Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. β on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) β on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks β on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the userβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the botβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True β Integration Notes β STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. β Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
|
audio_audio-buffer-processor_35423de9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#on-bot-turn-audio-data
|
2 |
+
Title: AudioBufferProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AudioBufferProcessor - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. β Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) β Parameters β sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transportβs sample rate from the StartFrame . β num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) β buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) β enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). β Properties β sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. β num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). β Methods β start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. β stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. β has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. β Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. β on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) β on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks β on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the userβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the botβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True β Integration Notes β STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. β Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
|
audio_audio-buffer-processor_40dcd59a.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#methods
|
2 |
+
Title: AudioBufferProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AudioBufferProcessor - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. β Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) β Parameters β sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transportβs sample rate from the StartFrame . β num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) β buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) β enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). β Properties β sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. β num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). β Methods β start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. β stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. β has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. β Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. β on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) β on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks β on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the userβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the botβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True β Integration Notes β STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. β Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
|
audio_audio-buffer-processor_41cb801d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#on-track-audio-data
|
2 |
+
Title: AudioBufferProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AudioBufferProcessor - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. β Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) β Parameters β sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transportβs sample rate from the StartFrame . β num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) β buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) β enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). β Properties β sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. β num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). β Methods β start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. β stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. β has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. β Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. β on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) β on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks β on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the userβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the botβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True β Integration Notes β STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. β Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
|
audio_audio-buffer-processor_9b72446e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#overview
|
2 |
+
Title: AudioBufferProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AudioBufferProcessor - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. β Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) β Parameters β sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transportβs sample rate from the StartFrame . β num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) β buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) β enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). β Properties β sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. β num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). β Methods β start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. β stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. β has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. β Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. β on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) β on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks β on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the userβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the botβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True β Integration Notes β STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. β Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
|
audio_audio-buffer-processor_d66f73d3.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/audio-buffer-processor#param-num-channels
|
2 |
+
Title: AudioBufferProcessor - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
AudioBufferProcessor - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing AudioBufferProcessor Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview The AudioBufferProcessor captures and buffers audio frames from both input (user) and output (bot) sources during conversations. It provides synchronized audio streams with configurable sample rates, supports both mono and stereo output, and offers flexible event handlers for various audio processing workflows. β Constructor Copy Ask AI AudioBufferProcessor( sample_rate = None , num_channels = 1 , buffer_size = 0 , enable_turn_audio = False , ** kwargs ) β Parameters β sample_rate Optional[int] default: "None" The desired output sample rate in Hz. If None , uses the transportβs sample rate from the StartFrame . β num_channels int default: "1" Number of output audio channels: 1 : Mono output (user and bot audio are mixed together) 2 : Stereo output (user audio on left channel, bot audio on right channel) β buffer_size int default: "0" Buffer size in bytes that triggers audio data events: 0 : Events only trigger when recording stops >0 : Events trigger whenever buffer reaches this size (useful for chunked processing) β enable_turn_audio bool default: "False" Whether to enable per-turn audio event handlers ( on_user_turn_audio_data and on_bot_turn_audio_data ). β Properties β sample_rate Copy Ask AI @ property def sample_rate ( self ) -> int The current sample rate of the audio processor in Hz. β num_channels Copy Ask AI @ property def num_channels ( self ) -> int The number of channels in the audio output (1 for mono, 2 for stereo). β Methods β start_recording() Copy Ask AI async def start_recording () Start recording audio from both user and bot sources. Initializes recording state and resets audio buffers. β stop_recording() Copy Ask AI async def stop_recording () Stop recording and trigger final audio data handlers with any remaining buffered audio. β has_audio() Copy Ask AI def has_audio () -> bool Check if both user and bot audio buffers contain data. Returns: True if both buffers contain audio data. β Event Handlers The processor supports multiple event handlers for different audio processing workflows. Register handlers using the @processor.event_handler() decorator. β on_audio_data Triggered when buffer_size is reached or recording stops, providing merged audio. Copy Ask AI @audiobuffer.event_handler ( "on_audio_data" ) async def on_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle merged audio data pass Parameters: buffer : The AudioBufferProcessor instance audio : Merged audio data (format depends on num_channels setting) sample_rate : Sample rate in Hz num_channels : Number of channels (1 or 2) β on_track_audio_data Triggered alongside on_audio_data , providing separate user and bot audio tracks. Copy Ask AI @audiobuffer.event_handler ( "on_track_audio_data" ) async def on_track_audio_data ( buffer , user_audio : bytes , bot_audio : bytes , sample_rate : int , num_channels : int ): # Handle separate audio tracks pass Parameters: buffer : The AudioBufferProcessor instance user_audio : Raw user audio bytes (always mono) bot_audio : Raw bot audio bytes (always mono) sample_rate : Sample rate in Hz num_channels : Always 1 for individual tracks β on_user_turn_audio_data Triggered when a user speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_user_turn_audio_data" ) async def on_user_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle user turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the userβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β on_bot_turn_audio_data Triggered when a bot speaking turn ends. Requires enable_turn_audio=True . Copy Ask AI @audiobuffer.event_handler ( "on_bot_turn_audio_data" ) async def on_bot_turn_audio_data ( buffer , audio : bytes , sample_rate : int , num_channels : int ): # Handle bot turn audio pass Parameters: buffer : The AudioBufferProcessor instance audio : Audio data from the botβs speaking turn sample_rate : Sample rate in Hz num_channels : Always 1 (mono) β Audio Processing Features Automatic resampling : Converts incoming audio to the specified sample rate Buffer synchronization : Aligns user and bot audio streams temporally Silence insertion : Fills gaps in non-continuous audio streams to maintain timing Turn tracking : Monitors speaking turns when enable_turn_audio=True β Integration Notes β STT Audio Passthrough If using an STT service in your pipeline, enable audio passthrough to make audio available to the AudioBufferProcessor: Copy Ask AI stt = DeepgramSTTService( api_key = os.getenv( "DEEPGRAM_API_KEY" ), audio_passthrough = True , ) audio_passthrough is enabled by default. β Pipeline Placement Add the AudioBufferProcessor after transport.output() to capture both user and bot audio: Copy Ask AI pipeline = Pipeline([ transport.input(), # ... other processors ... transport.output(), audiobuffer, # Place after audio output # ... remaining processors ... ]) UserIdleProcessor KoalaFilter On this page Overview Constructor Parameters Properties sample_rate num_channels Methods start_recording() stop_recording() has_audio() Event Handlers on_audio_data on_track_audio_data on_user_turn_audio_data on_bot_turn_audio_data Audio Processing Features Integration Notes STT Audio Passthrough Pipeline Placement Assistant Responses are generated using AI and may contain mistakes.
|
audio_koala-filter_20ce4d60.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/koala-filter#overview
|
2 |
+
Title: KoalaFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
KoalaFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing KoalaFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview KoalaFilter is an audio processor that reduces background noise in real-time audio streams using Koala Noise Suppression technology from Picovoice. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Koala, you need a Picovoice access key. Get started at Picovoice Console . β Installation The Koala filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[koala]" Youβll also need to set up your Koala access key as an environment variable: KOALA_ACCESS_KEY β Constructor Parameters β access_key str required Picovoice access key for using the Koala noise suppression service β Input Frames β FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) β Usage Example Copy Ask AI from pipecat.audio.filters.koala_filter import KoalaFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KoalaFilter( access_key = os.getenv( "KOALA_ACCESS_KEY" )), # Enable Koala noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) β Audio Flow β Notes Requires Picovoice access key Supports real-time audio processing Handles 16-bit PCM audio format Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency Automatically handles audio frame buffering Sample rate must match Koalaβs required sample rate AudioBufferProcessor KrispFilter On this page Overview Installation Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_koala-filter_e4d8a296.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/koala-filter#param-access-key
|
2 |
+
Title: KoalaFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
KoalaFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing KoalaFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview KoalaFilter is an audio processor that reduces background noise in real-time audio streams using Koala Noise Suppression technology from Picovoice. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Koala, you need a Picovoice access key. Get started at Picovoice Console . β Installation The Koala filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[koala]" Youβll also need to set up your Koala access key as an environment variable: KOALA_ACCESS_KEY β Constructor Parameters β access_key str required Picovoice access key for using the Koala noise suppression service β Input Frames β FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) β Usage Example Copy Ask AI from pipecat.audio.filters.koala_filter import KoalaFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KoalaFilter( access_key = os.getenv( "KOALA_ACCESS_KEY" )), # Enable Koala noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) β Audio Flow β Notes Requires Picovoice access key Supports real-time audio processing Handles 16-bit PCM audio format Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency Automatically handles audio frame buffering Sample rate must match Koalaβs required sample rate AudioBufferProcessor KrispFilter On this page Overview Installation Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_krisp-filter_39dfde86.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/krisp-filter#input-frames
|
2 |
+
Title: KrispFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
KrispFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing KrispFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview KrispFilter is an audio processor that reduces background noise in real-time audio streams using Krisp AI technology. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Krisp, you need a Krisp SDK license. Get started at Krisp.ai . Looking for help getting started with Krisp and Pipecat? Checkout our Krisp noise cancellation guide . β Installation The Krisp filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[krisp]" β Environment Variables You need to provide the path to the Krisp model. This can either be done by setting the KRISP_MODEL_PATH environment variable or by setting the model_path in the constructor. β Constructor Parameters β sample_type str default: "PCM_16" Audio sample type format β channels int default: "1" Number of audio channels β model_path str default: "None" Path to the Krisp model file. You can set the model_path directly. Alternatively, you can set the KRISP_MODEL_PATH environment variable to the model file path. β Input Frames β FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) β Usage Example Copy Ask AI from pipecat.audio.filters.krisp_filter import KrispFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KrispFilter(), # Enable Krisp noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) β Audio Flow β Notes Requires Krisp SDK and model file to be available Supports real-time audio processing Supports additional features like background voice removal Handles PCM_16 audio format Thread-safe for pipeline processing Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency KoalaFilter NoisereduceFilter On this page Overview Installation Environment Variables Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_krisp-filter_8376d300.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/krisp-filter#overview
|
2 |
+
Title: KrispFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
KrispFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing KrispFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview KrispFilter is an audio processor that reduces background noise in real-time audio streams using Krisp AI technology. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Krisp, you need a Krisp SDK license. Get started at Krisp.ai . Looking for help getting started with Krisp and Pipecat? Checkout our Krisp noise cancellation guide . β Installation The Krisp filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[krisp]" β Environment Variables You need to provide the path to the Krisp model. This can either be done by setting the KRISP_MODEL_PATH environment variable or by setting the model_path in the constructor. β Constructor Parameters β sample_type str default: "PCM_16" Audio sample type format β channels int default: "1" Number of audio channels β model_path str default: "None" Path to the Krisp model file. You can set the model_path directly. Alternatively, you can set the KRISP_MODEL_PATH environment variable to the model file path. β Input Frames β FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) β Usage Example Copy Ask AI from pipecat.audio.filters.krisp_filter import KrispFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KrispFilter(), # Enable Krisp noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) β Audio Flow β Notes Requires Krisp SDK and model file to be available Supports real-time audio processing Supports additional features like background voice removal Handles PCM_16 audio format Thread-safe for pipeline processing Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency KoalaFilter NoisereduceFilter On this page Overview Installation Environment Variables Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_krisp-filter_b89f262b.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/api-reference/utilities/audio/krisp-filter#real-time-processing
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
audio_krisp-filter_ddfb5acd.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/krisp-filter#notes
|
2 |
+
Title: KrispFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
KrispFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing KrispFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview KrispFilter is an audio processor that reduces background noise in real-time audio streams using Krisp AI technology. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. To use Krisp, you need a Krisp SDK license. Get started at Krisp.ai . Looking for help getting started with Krisp and Pipecat? Checkout our Krisp noise cancellation guide . β Installation The Krisp filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[krisp]" β Environment Variables You need to provide the path to the Krisp model. This can either be done by setting the KRISP_MODEL_PATH environment variable or by setting the model_path in the constructor. β Constructor Parameters β sample_type str default: "PCM_16" Audio sample type format β channels int default: "1" Number of audio channels β model_path str default: "None" Path to the Krisp model file. You can set the model_path directly. Alternatively, you can set the KRISP_MODEL_PATH environment variable to the model file path. β Input Frames β FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) β Usage Example Copy Ask AI from pipecat.audio.filters.krisp_filter import KrispFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = KrispFilter(), # Enable Krisp noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) β Audio Flow β Notes Requires Krisp SDK and model file to be available Supports real-time audio processing Supports additional features like background voice removal Handles PCM_16 audio format Thread-safe for pipeline processing Can be dynamically enabled/disabled Maintains audio quality while reducing noise Efficient processing for low latency KoalaFilter NoisereduceFilter On this page Overview Installation Environment Variables Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_noisereduce-filter_1e57a17d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter#input-frames
|
2 |
+
Title: NoisereduceFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
NoisereduceFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing NoisereduceFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview NoisereduceFilter is an audio processor that reduces background noise in real-time audio streams using the noisereduce library. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. β Installation The noisereduce filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[noisereduce]" β Constructor Parameters This filter has no configurable parameters in its constructor. β Input Frames β FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) β Usage Example Copy Ask AI from pipecat.audio.filters.noisereduce_filter import NoisereduceFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = NoisereduceFilter(), # Enable noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) β Audio Flow β Notes Lightweight alternative to Krisp for noise reduction Supports real-time audio processing Handles PCM_16 audio format Thread-safe for pipeline processing Can be dynamically enabled/disabled No additional configuration required Uses statistical noise reduction techniques KrispFilter SileroVADAnalyzer On this page Overview Installation Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_noisereduce-filter_252ff8b7.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/noisereduce-filter#overview
|
2 |
+
Title: NoisereduceFilter - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
NoisereduceFilter - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing NoisereduceFilter Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview NoisereduceFilter is an audio processor that reduces background noise in real-time audio streams using the noisereduce library. It inherits from BaseAudioFilter and processes audio frames to improve audio quality by removing unwanted noise. β Installation The noisereduce filter requires additional dependencies: Copy Ask AI pip install "pipecat-ai[noisereduce]" β Constructor Parameters This filter has no configurable parameters in its constructor. β Input Frames β FilterEnableFrame Frame Specific control frame to toggle filtering on/off Copy Ask AI from pipecat.frames.frames import FilterEnableFrame # Disable noise reduction await task.queue_frame(FilterEnableFrame( False )) # Re-enable noise reduction await task.queue_frame(FilterEnableFrame( True )) β Usage Example Copy Ask AI from pipecat.audio.filters.noisereduce_filter import NoisereduceFilter transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_filter = NoisereduceFilter(), # Enable noise reduction audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer(), ), ) β Audio Flow β Notes Lightweight alternative to Krisp for noise reduction Supports real-time audio processing Handles PCM_16 audio format Thread-safe for pipeline processing Can be dynamically enabled/disabled No additional configuration required Uses statistical noise reduction techniques KrispFilter SileroVADAnalyzer On this page Overview Installation Constructor Parameters Input Frames Usage Example Audio Flow Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_silero-vad-analyzer_663f6c6f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer#param-sample-rate
|
2 |
+
Title: SileroVADAnalyzer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SileroVADAnalyzer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing SileroVADAnalyzer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SileroVADAnalyzer is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime. β Installation The Silero VAD analyzer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[silero]" β Constructor Parameters β sample_rate int default: "None" Audio sample rate in Hz. Must be either 8000 or 16000. β params VADParams default: "VADParams()" Voice Activity Detection parameters object Show properties β confidence float default: "0.7" Confidence threshold for speech detection. Higher values make detection more strict. Must be between 0 and 1. β start_secs float default: "0.2" Time in seconds that speech must be detected before transitioning to SPEAKING state. β stop_secs float default: "0.8" Time in seconds of silence required before transitioning back to QUIET state. β min_volume float default: "0.6" Minimum audio volume threshold for speech detection. Must be between 0 and 1. β Usage Example Copy Ask AI transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) β Technical Details β Sample Rate Requirements The analyzer supports two sample rates: 8000 Hz (256 samples per frame) 16000 Hz (512 samples per frame) Model Management Uses ONNX runtime for efficient inference Automatically resets model state every 5 seconds to manage memory Runs on CPU by default for consistent performance Includes built-in model file β Notes High-accuracy speech detection Efficient ONNX-based processing Automatic memory management Thread-safe for pipeline processing Built-in model file included CPU-optimized inference Supports 8kHz and 16kHz audio NoisereduceFilter SoundfileMixer On this page Overview Installation Constructor Parameters Usage Example Technical Details Sample Rate Requirements Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_silero-vad-analyzer_7a9c8d73.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer#overview
|
2 |
+
Title: SileroVADAnalyzer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SileroVADAnalyzer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing SileroVADAnalyzer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SileroVADAnalyzer is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime. β Installation The Silero VAD analyzer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[silero]" β Constructor Parameters β sample_rate int default: "None" Audio sample rate in Hz. Must be either 8000 or 16000. β params VADParams default: "VADParams()" Voice Activity Detection parameters object Show properties β confidence float default: "0.7" Confidence threshold for speech detection. Higher values make detection more strict. Must be between 0 and 1. β start_secs float default: "0.2" Time in seconds that speech must be detected before transitioning to SPEAKING state. β stop_secs float default: "0.8" Time in seconds of silence required before transitioning back to QUIET state. β min_volume float default: "0.6" Minimum audio volume threshold for speech detection. Must be between 0 and 1. β Usage Example Copy Ask AI transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) β Technical Details β Sample Rate Requirements The analyzer supports two sample rates: 8000 Hz (256 samples per frame) 16000 Hz (512 samples per frame) Model Management Uses ONNX runtime for efficient inference Automatically resets model state every 5 seconds to manage memory Runs on CPU by default for consistent performance Includes built-in model file β Notes High-accuracy speech detection Efficient ONNX-based processing Automatic memory management Thread-safe for pipeline processing Built-in model file included CPU-optimized inference Supports 8kHz and 16kHz audio NoisereduceFilter SoundfileMixer On this page Overview Installation Constructor Parameters Usage Example Technical Details Sample Rate Requirements Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_silero-vad-analyzer_90a6d9cd.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer#installation
|
2 |
+
Title: SileroVADAnalyzer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SileroVADAnalyzer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing SileroVADAnalyzer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SileroVADAnalyzer is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime. β Installation The Silero VAD analyzer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[silero]" β Constructor Parameters β sample_rate int default: "None" Audio sample rate in Hz. Must be either 8000 or 16000. β params VADParams default: "VADParams()" Voice Activity Detection parameters object Show properties β confidence float default: "0.7" Confidence threshold for speech detection. Higher values make detection more strict. Must be between 0 and 1. β start_secs float default: "0.2" Time in seconds that speech must be detected before transitioning to SPEAKING state. β stop_secs float default: "0.8" Time in seconds of silence required before transitioning back to QUIET state. β min_volume float default: "0.6" Minimum audio volume threshold for speech detection. Must be between 0 and 1. β Usage Example Copy Ask AI transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) β Technical Details β Sample Rate Requirements The analyzer supports two sample rates: 8000 Hz (256 samples per frame) 16000 Hz (512 samples per frame) Model Management Uses ONNX runtime for efficient inference Automatically resets model state every 5 seconds to manage memory Runs on CPU by default for consistent performance Includes built-in model file β Notes High-accuracy speech detection Efficient ONNX-based processing Automatic memory management Thread-safe for pipeline processing Built-in model file included CPU-optimized inference Supports 8kHz and 16kHz audio NoisereduceFilter SoundfileMixer On this page Overview Installation Constructor Parameters Usage Example Technical Details Sample Rate Requirements Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_silero-vad-analyzer_edb34478.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer#param-stop-secs
|
2 |
+
Title: SileroVADAnalyzer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SileroVADAnalyzer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing SileroVADAnalyzer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SileroVADAnalyzer is a Voice Activity Detection (VAD) analyzer that uses the Silero VAD ONNX model to detect speech in audio streams. It provides high-accuracy speech detection with efficient processing using ONNX runtime. β Installation The Silero VAD analyzer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[silero]" β Constructor Parameters β sample_rate int default: "None" Audio sample rate in Hz. Must be either 8000 or 16000. β params VADParams default: "VADParams()" Voice Activity Detection parameters object Show properties β confidence float default: "0.7" Confidence threshold for speech detection. Higher values make detection more strict. Must be between 0 and 1. β start_secs float default: "0.2" Time in seconds that speech must be detected before transitioning to SPEAKING state. β stop_secs float default: "0.8" Time in seconds of silence required before transitioning back to QUIET state. β min_volume float default: "0.6" Minimum audio volume threshold for speech detection. Must be between 0 and 1. β Usage Example Copy Ask AI transport = DailyTransport( room_url, token, "Respond bot" , DailyParams( audio_in_enabled = True , audio_out_enabled = True , vad_analyzer = SileroVADAnalyzer( params = VADParams( stop_secs = 0.5 )), ), ) β Technical Details β Sample Rate Requirements The analyzer supports two sample rates: 8000 Hz (256 samples per frame) 16000 Hz (512 samples per frame) Model Management Uses ONNX runtime for efficient inference Automatically resets model state every 5 seconds to manage memory Runs on CPU by default for consistent performance Includes built-in model file β Notes High-accuracy speech detection Efficient ONNX-based processing Automatic memory management Thread-safe for pipeline processing Built-in model file included CPU-optimized inference Supports 8kHz and 16kHz audio NoisereduceFilter SoundfileMixer On this page Overview Installation Constructor Parameters Usage Example Technical Details Sample Rate Requirements Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_soundfile-mixer_3a5c086e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/soundfile-mixer#param-volume
|
2 |
+
Title: SoundfileMixer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SoundfileMixer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing SoundfileMixer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SoundfileMixer is an audio mixer that combines incoming audio with audio from files. It supports multiple audio file formats through the soundfile library and can handle runtime volume adjustments and sound switching. β Installation The soundfile mixer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[soundfile]" β Constructor Parameters β sound_files Mapping[str, str] required Dictionary mapping sound names to file paths. Files must be mono (single channel). β default_sound str required Name of the default sound to play (must be a key in sound_files). β volume float default: "0.4" Initial volume for the mixed sound. Values typically range from 0.0 to 1.0, but can go higher. β loop bool default: "true" Whether to loop the sound file when it reaches the end. β Control Frames β MixerUpdateSettingsFrame Frame Updates mixer settings at runtime Show properties β sound str Changes the current playing sound (must be a key in sound_files) β volume float Updates the mixing volume β loop bool Updates whether the sound should loop β MixerEnableFrame Frame Enables or disables the mixer Show properties β enable bool Whether mixing should be enabled β Usage Example Copy Ask AI # Initialize mixer with sound files mixer = SoundfileMixer( sound_files = { "office" : "office_ambience.wav" }, default_sound = "office" , volume = 2.0 , ) # Add to transport transport = DailyTransport( room_url, token, "Audio Bot" , DailyParams( audio_out_enabled = True , audio_out_mixer = mixer, ), ) # Control mixer at runtime await task.queue_frame(MixerUpdateSettingsFrame({ "volume" : 0.5 })) await task.queue_frame(MixerEnableFrame( False )) # Disable mixing await task.queue_frame(MixerEnableFrame( True )) # Enable mixing β Notes Supports any audio format that soundfile can read Automatically resamples audio files to match output sample rate Files must be mono (single channel) Thread-safe for pipeline processing Can dynamically switch between multiple sound files Volume can be adjusted in real-time Mixing can be enabled/disabled on demand SileroVADAnalyzer FrameFilter On this page Overview Installation Constructor Parameters Control Frames Usage Example Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_soundfile-mixer_4e4b1cf9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/soundfile-mixer#param-sound
|
2 |
+
Title: SoundfileMixer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SoundfileMixer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing SoundfileMixer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SoundfileMixer is an audio mixer that combines incoming audio with audio from files. It supports multiple audio file formats through the soundfile library and can handle runtime volume adjustments and sound switching. β Installation The soundfile mixer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[soundfile]" β Constructor Parameters β sound_files Mapping[str, str] required Dictionary mapping sound names to file paths. Files must be mono (single channel). β default_sound str required Name of the default sound to play (must be a key in sound_files). β volume float default: "0.4" Initial volume for the mixed sound. Values typically range from 0.0 to 1.0, but can go higher. β loop bool default: "true" Whether to loop the sound file when it reaches the end. β Control Frames β MixerUpdateSettingsFrame Frame Updates mixer settings at runtime Show properties β sound str Changes the current playing sound (must be a key in sound_files) β volume float Updates the mixing volume β loop bool Updates whether the sound should loop β MixerEnableFrame Frame Enables or disables the mixer Show properties β enable bool Whether mixing should be enabled β Usage Example Copy Ask AI # Initialize mixer with sound files mixer = SoundfileMixer( sound_files = { "office" : "office_ambience.wav" }, default_sound = "office" , volume = 2.0 , ) # Add to transport transport = DailyTransport( room_url, token, "Audio Bot" , DailyParams( audio_out_enabled = True , audio_out_mixer = mixer, ), ) # Control mixer at runtime await task.queue_frame(MixerUpdateSettingsFrame({ "volume" : 0.5 })) await task.queue_frame(MixerEnableFrame( False )) # Disable mixing await task.queue_frame(MixerEnableFrame( True )) # Enable mixing β Notes Supports any audio format that soundfile can read Automatically resamples audio files to match output sample rate Files must be mono (single channel) Thread-safe for pipeline processing Can dynamically switch between multiple sound files Volume can be adjusted in real-time Mixing can be enabled/disabled on demand SileroVADAnalyzer FrameFilter On this page Overview Installation Constructor Parameters Control Frames Usage Example Notes Assistant Responses are generated using AI and may contain mistakes.
|
audio_soundfile-mixer_78c1ff2d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/audio/soundfile-mixer#installation
|
2 |
+
Title: SoundfileMixer - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SoundfileMixer - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Audio Processing SoundfileMixer Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing AudioBufferProcessor KoalaFilter KrispFilter NoisereduceFilter SileroVADAnalyzer SoundfileMixer Frame Filters Metrics and Telemetry MCP Observers Service Utilities Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline β Overview SoundfileMixer is an audio mixer that combines incoming audio with audio from files. It supports multiple audio file formats through the soundfile library and can handle runtime volume adjustments and sound switching. β Installation The soundfile mixer requires additional dependencies: Copy Ask AI pip install "pipecat-ai[soundfile]" β Constructor Parameters β sound_files Mapping[str, str] required Dictionary mapping sound names to file paths. Files must be mono (single channel). β default_sound str required Name of the default sound to play (must be a key in sound_files). β volume float default: "0.4" Initial volume for the mixed sound. Values typically range from 0.0 to 1.0, but can go higher. β loop bool default: "true" Whether to loop the sound file when it reaches the end. β Control Frames β MixerUpdateSettingsFrame Frame Updates mixer settings at runtime Show properties β sound str Changes the current playing sound (must be a key in sound_files) β volume float Updates the mixing volume β loop bool Updates whether the sound should loop β MixerEnableFrame Frame Enables or disables the mixer Show properties β enable bool Whether mixing should be enabled β Usage Example Copy Ask AI # Initialize mixer with sound files mixer = SoundfileMixer( sound_files = { "office" : "office_ambience.wav" }, default_sound = "office" , volume = 2.0 , ) # Add to transport transport = DailyTransport( room_url, token, "Audio Bot" , DailyParams( audio_out_enabled = True , audio_out_mixer = mixer, ), ) # Control mixer at runtime await task.queue_frame(MixerUpdateSettingsFrame({ "volume" : 0.5 })) await task.queue_frame(MixerEnableFrame( False )) # Disable mixing await task.queue_frame(MixerEnableFrame( True )) # Enable mixing β Notes Supports any audio format that soundfile can read Automatically resamples audio files to match output sample rate Files must be mono (single channel) Thread-safe for pipeline processing Can dynamically switch between multiple sound files Volume can be adjusted in real-time Mixing can be enabled/disabled on demand SileroVADAnalyzer FrameFilter On this page Overview Installation Constructor Parameters Control Frames Usage Example Notes Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_media_0c2e6a18.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/media#next-steps
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_media_adb613bc.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/media#visionservice
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_speech_0ac5790e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/speech#methods
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_speech_67e5e89f.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/speech#what-you-can-build
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_speech_7200e74e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/speech#join-our-community
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_speech_8cd29105.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/speech#ttsservice
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
base-classes_text_d8b7bcb1.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/base-classes/text#next-steps
|
2 |
+
Title: Overview - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Overview - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Get Started Overview Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Get Started Overview Installation & Setup Quickstart Core Concepts Next Steps & Examples Pipecat is an open source Python framework that handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions. βMultimodalβ means you can use any combination of audio, video, images, and/or text in your interactions. And βreal-timeβ means that things are happening quickly enough that it feels conversationalβa βback-and-forthβ with a bot, not submitting a query and waiting for results. β What You Can Build Voice Assistants Natural, real-time conversations with AI using speech recognition and synthesis Interactive Agents Personal coaches and meeting assistants that can understand context and provide guidance Multimodal Apps Applications that combine voice, video, images, and text for rich interactions Creative Tools Storytelling experiences and social companions that engage users Business Solutions Customer intake flows and support bots for automated business processes Complex Flows Structured conversations using Pipecat Flows for managing complex interactions β How It Works The flow of interactions in a Pipecat application is typically straightforward: The bot says something The user says something The bot says something The user says something This continues until the conversation naturally ends. While this flow seems simple, making it feel natural requires sophisticated real-time processing. β Real-time Processing Pipecatβs pipeline architecture handles both simple voice interactions and complex multimodal processing. Letβs look at how data flows through the system: Voice app Multimodal app 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio Transmit and capture streamed audio from the user 2 Transcribe Speech Convert speech to text as the user is talking 3 Process with LLM Generate responses using a large language model 4 Convert to Speech Transform text responses into natural speech 5 Play Audio Stream the audio response back to the user 1 Send Audio and Video Transmit and capture audio, video, and image inputs simultaneously 2 Process Streams Handle multiple input streams in parallel 3 Model Processing Send combined inputs to multimodal models (like GPT-4V) 4 Generate Outputs Create various outputs (text, images, audio, etc.) 5 Coordinate Presentation Synchronize and present multiple output types In both cases, Pipecat: Processes responses as they stream in Handles multiple input/output modalities concurrently Manages resource allocation and synchronization Coordinates parallel processing tasks This architecture creates fluid, natural interactions without noticeable delays, whether youβre building a simple voice assistant or a complex multimodal application. Pipecatβs pipeline architecture is particularly valuable for managing the complexity of real-time, multimodal interactions, ensuring smooth data flow and proper synchronization regardless of the input/output types involved. Pipecat handles all this complexity for you, letting you focus on building your application rather than managing the underlying infrastructure. β Next Steps Ready to build your first Pipecat application? Installation & Setup Prepare your environment and install required dependencies Quickstart Build and run your first Pipecat application Core Concepts Learn about pipelines, frames, and real-time processing Use Cases Explore example implementations and patterns β Join Our Community Discord Community Connect with other developers, share your projects, and get support from the Pipecat team. Installation & Setup On this page What You Can Build How It Works Real-time Processing Next Steps Join Our Community Assistant Responses are generated using AI and may contain mistakes.
|
c_introduction_2d027ecf.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/c++/introduction
|
2 |
+
Title: SDK Introduction - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SDK Introduction - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation C++ SDK SDK Introduction Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat C++ SDK provides a native implementation for building voice and multimodal AI applications. It supports: Linux ( x86_64 and aarch64 ) macOS ( aarch64 ) Windows ( x86_64 ) β Dependencies β libcurl The SDK uses libcurl for HTTP requests. Linux macOS Windows Copy Ask AI sudo apt-get install libcurl4-openssl-dev Copy Ask AI sudo apt-get install libcurl4-openssl-dev On macOS libcurl is already included so there is nothing to install. On Windows we use vcpkg to install dependencies. You need to set it up following one of the tutorials . The libcurl dependency will be automatically downloaded when building. β Installation Build the SDK using CMake: Linux/macOS Windows Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI # Initialize Visual Studio environment "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Auxiliary\Build\vcvarsall.bat" amd64 # Configure and build cmake . -Bbuild --preset vcpkg cmake --buildbuild --config Release β Cross-compilation For Linux aarch64: Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_TOOLCHAIN_FILE=aarch64-linux-toolchain.cmake -DCMAKE_BUILD_TYPE=Release ninja -C build β Documentation API Reference Complete SDK API documentation Daily Transport WebRTC implementation using Daily Small WebRTC Transport Daily WebRTC Transport On this page Dependencies libcurl Installation Cross-compilation Documentation Assistant Responses are generated using AI and may contain mistakes.
|
c_introduction_ea0aa8d6.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/c++/introduction#installation
|
2 |
+
Title: SDK Introduction - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
SDK Introduction - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation C++ SDK SDK Introduction Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Pipecat C++ SDK provides a native implementation for building voice and multimodal AI applications. It supports: Linux ( x86_64 and aarch64 ) macOS ( aarch64 ) Windows ( x86_64 ) β Dependencies β libcurl The SDK uses libcurl for HTTP requests. Linux macOS Windows Copy Ask AI sudo apt-get install libcurl4-openssl-dev Copy Ask AI sudo apt-get install libcurl4-openssl-dev On macOS libcurl is already included so there is nothing to install. On Windows we use vcpkg to install dependencies. You need to set it up following one of the tutorials . The libcurl dependency will be automatically downloaded when building. β Installation Build the SDK using CMake: Linux/macOS Windows Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI # Initialize Visual Studio environment "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Auxiliary\Build\vcvarsall.bat" amd64 # Configure and build cmake . -Bbuild --preset vcpkg cmake --buildbuild --config Release β Cross-compilation For Linux aarch64: Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_TOOLCHAIN_FILE=aarch64-linux-toolchain.cmake -DCMAKE_BUILD_TYPE=Release ninja -C build β Documentation API Reference Complete SDK API documentation Daily Transport WebRTC implementation using Daily Small WebRTC Transport Daily WebRTC Transport On this page Dependencies libcurl Installation Cross-compilation Documentation Assistant Responses are generated using AI and may contain mistakes.
|
c_transport_bd70965d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/c++/transport
|
2 |
+
Title: Daily WebRTC Transport - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily WebRTC Transport - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation C++ SDK Daily WebRTC Transport Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Daily transport implementation enables real-time audio and video communication in your Pipecat C++ applications using Dailyβs WebRTC infrastructure. β Dependencies β Daily Core C++ SDK Download the Daily Core C++ SDK from the available releases for your platform and set: Copy Ask AI export DAILY_CORE_PATH = / path / to / daily-core-sdk β Pipecat C++ SDK Build the base Pipecat C++ SDK first and set: Copy Ask AI export PIPECAT_SDK_PATH = / path / to / pipecat-client-cxx β Building First, set a few environment variables: Copy Ask AI PIPECAT_SDK_PATH = /path/to/pipecat-client-cxx DAILY_CORE_PATH = /path/to/daily-core-sdk Then, build the project: Linux/macOS Windows Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI cmake . -G Ninja -Bbuild -DCMAKE_BUILD_TYPE=Release ninja -C build Copy Ask AI # Initialize Visual Studio environment "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Auxiliary\Build\vcvarsall.bat" amd64 # Configure and build cmake . -Bbuild --preset vcpkg cmake --build build --config Release β Examples Basic Client Simple C++ implementation example Audio Client C++ client with PortAudio support Node.js Server Example Node.js proxy implementation SDK Introduction On this page Dependencies Daily Core C++ SDK Pipecat C++ SDK Building Examples Assistant Responses are generated using AI and may contain mistakes.
|
client_introduction_394dd56e.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/introduction#pipecatclient
|
2 |
+
Title: Client SDKs - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Client SDKs - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Client SDKs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Client SDKs are currently in transition to a new, simpler API design. The js and react libraries have already been deployed with these changes. Their corresponding documentation along with this top-level documentation has been updated to reflect the latest changes. For transitioning to the new API, please refer to the migration guide . Note that React Native, iOS, and Android SDKs are still in the process of being updated and their documentation will be updated once the new versions are released. If you have any questions or need assistance, please reach out to us on Discord . Pipecat provides client SDKs for multiple platforms, all implementing the RTVI (Real-Time Voice and Video Inference) standard. These SDKs make it easy to build real-time AI applications that can handle voice, video, and text interactions. Javascript Pipecat JS SDK React Pipecat React SDK React Native Pipecat React Native SDK Swift Pipecat iOS SDK Kotlin Pipecat Android SDK C++ Pipecat C++ SDK β Core Functionality All Pipecat client SDKs provide: Media Management Handle device inputs and media streams for audio and video Bot Integration Configure and communicate with your Pipecat bot Session Management Manage connection state and error handling β Core Types β PipecatClient The main class for interacting with Pipecat bots. It is the primary type you will interact with. β Transport The PipecatClient wraps a Transport, which defines and provides the underlying connection mechanism (e.g., WebSocket, WebRTC). Your Pipecat pipeline will contain a corresponding transport. β RTVIMessage Represents a message sent to or received from a Pipecat bot. β Simple Usage Examples Connecting to a Bot Custom Messaging Establish ongoing connections via WebSocket or WebRTC for: Live voice conversations Real-time video processing Continuous interactions javascript react Copy Ask AI // Example: Establishing a real-time connection import { RTVIEvent , RTVIMessage , PipecatClient } from "@pipecat-ai/client-js" ; import { DailyTransport } from "@pipecat-ai/daily-transport" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , enableCam: false , enableScreenShare: false , callbacks: { onBotConnected : () => { console . log ( "[CALLBACK] Bot connected" ); }, onBotDisconnected : () => { console . log ( "[CALLBACK] Bot disconnected" ); }, onBotReady : () => { console . log ( "[CALLBACK] Bot ready to chat!" ); }, }, }); try { // Below, we use a REST endpoint to fetch connection credentials for our // Daily Transport. Alternatively, you could provide those credentials // directly to `connect()`. await pcClient . connect ({ endpoint: "https://your-connect-end-point-here/connect" , }); } catch ( e ) { console . error ( e . message ); } // Events (alternative approach to constructor-provided callbacks) pcClient . on ( RTVIEvent . Connected , () => { console . log ( "[EVENT] User connected" ); }); pcClient . on ( RTVIEvent . Disconnected , () => { console . log ( "[EVENT] User disconnected" ); }); Establish ongoing connections via WebSocket or WebRTC for: Live voice conversations Real-time video processing Continuous interactions javascript react Copy Ask AI // Example: Establishing a real-time connection import { RTVIEvent , RTVIMessage , PipecatClient } from "@pipecat-ai/client-js" ; import { DailyTransport } from "@pipecat-ai/daily-transport" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , enableCam: false , enableScreenShare: false , callbacks: { onBotConnected : () => { console . log ( "[CALLBACK] Bot connected" ); }, onBotDisconnected : () => { console . log ( "[CALLBACK] Bot disconnected" ); }, onBotReady : () => { console . log ( "[CALLBACK] Bot ready to chat!" ); }, }, }); try { // Below, we use a REST endpoint to fetch connection credentials for our // Daily Transport. Alternatively, you could provide those credentials // directly to `connect()`. await pcClient . connect ({ endpoint: "https://your-connect-end-point-here/connect" , }); } catch ( e ) { console . error ( e . message ); } // Events (alternative approach to constructor-provided callbacks) pcClient . on ( RTVIEvent . Connected , () => { console . log ( "[EVENT] User connected" ); }); pcClient . on ( RTVIEvent . Disconnected , () => { console . log ( "[EVENT] User disconnected" ); }); Send custom messages and handle responses from your bot. This is useful for: Running server-side functionality Triggering specific bot actions Querying the server Responding to server requests javascript react Copy Ask AI import { PipecatClient } from "@pipecat-ai/client-js" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), callbacks: { onBotConnected : () => { pcClient . sendClientRequest ( 'get-language' ) . then (( response ) => { console . log ( "[CALLBACK] Bot using language:" , response ); if ( response !== preferredLanguage ) { pcClient . sendClientMessage ( 'set-language' , { language: preferredLanguage }); } }) . catch (( error ) => { console . error ( "[CALLBACK] Error getting language:" , error ); }); }, onServerMessage : ( message ) => { console . log ( "[CALLBACK] Received message from server:" , message ); }, }, }); await pcClient . connect ({ url: "https://your-daily-room-url" , token: "your-daily-token" }); β About RTVI Pipecatβs client SDKs implement the RTVI (Real-Time Voice and Video Inference) standard, an open specification for real-time AI inference. This means: Your code can work with any RTVI-compatible inference service You get battle-tested tooling for real-time multimedia handling You can easily set up development and testing environments β Next Steps Get started by trying out examples: Simple Chatbot Example Complete client-server example with both bot backend (Python) and frontend implementation (JS, React, React Native, iOS, and Android). More Examples Explore our full collection of example applications and implementations across different platforms and use cases. The RTVI Standard On this page Core Functionality Core Types PipecatClient Transport RTVIMessage Simple Usage Examples About RTVI Next Steps Assistant Responses are generated using AI and may contain mistakes.
|
client_introduction_769681ac.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/introduction#simple-usage-examples
|
2 |
+
Title: Client SDKs - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Client SDKs - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Client SDKs Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The Client SDKs are currently in transition to a new, simpler API design. The js and react libraries have already been deployed with these changes. Their corresponding documentation along with this top-level documentation has been updated to reflect the latest changes. For transitioning to the new API, please refer to the migration guide . Note that React Native, iOS, and Android SDKs are still in the process of being updated and their documentation will be updated once the new versions are released. If you have any questions or need assistance, please reach out to us on Discord . Pipecat provides client SDKs for multiple platforms, all implementing the RTVI (Real-Time Voice and Video Inference) standard. These SDKs make it easy to build real-time AI applications that can handle voice, video, and text interactions. Javascript Pipecat JS SDK React Pipecat React SDK React Native Pipecat React Native SDK Swift Pipecat iOS SDK Kotlin Pipecat Android SDK C++ Pipecat C++ SDK β Core Functionality All Pipecat client SDKs provide: Media Management Handle device inputs and media streams for audio and video Bot Integration Configure and communicate with your Pipecat bot Session Management Manage connection state and error handling β Core Types β PipecatClient The main class for interacting with Pipecat bots. It is the primary type you will interact with. β Transport The PipecatClient wraps a Transport, which defines and provides the underlying connection mechanism (e.g., WebSocket, WebRTC). Your Pipecat pipeline will contain a corresponding transport. β RTVIMessage Represents a message sent to or received from a Pipecat bot. β Simple Usage Examples Connecting to a Bot Custom Messaging Establish ongoing connections via WebSocket or WebRTC for: Live voice conversations Real-time video processing Continuous interactions javascript react Copy Ask AI // Example: Establishing a real-time connection import { RTVIEvent , RTVIMessage , PipecatClient } from "@pipecat-ai/client-js" ; import { DailyTransport } from "@pipecat-ai/daily-transport" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , enableCam: false , enableScreenShare: false , callbacks: { onBotConnected : () => { console . log ( "[CALLBACK] Bot connected" ); }, onBotDisconnected : () => { console . log ( "[CALLBACK] Bot disconnected" ); }, onBotReady : () => { console . log ( "[CALLBACK] Bot ready to chat!" ); }, }, }); try { // Below, we use a REST endpoint to fetch connection credentials for our // Daily Transport. Alternatively, you could provide those credentials // directly to `connect()`. await pcClient . connect ({ endpoint: "https://your-connect-end-point-here/connect" , }); } catch ( e ) { console . error ( e . message ); } // Events (alternative approach to constructor-provided callbacks) pcClient . on ( RTVIEvent . Connected , () => { console . log ( "[EVENT] User connected" ); }); pcClient . on ( RTVIEvent . Disconnected , () => { console . log ( "[EVENT] User disconnected" ); }); Establish ongoing connections via WebSocket or WebRTC for: Live voice conversations Real-time video processing Continuous interactions javascript react Copy Ask AI // Example: Establishing a real-time connection import { RTVIEvent , RTVIMessage , PipecatClient } from "@pipecat-ai/client-js" ; import { DailyTransport } from "@pipecat-ai/daily-transport" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), enableMic: true , enableCam: false , enableScreenShare: false , callbacks: { onBotConnected : () => { console . log ( "[CALLBACK] Bot connected" ); }, onBotDisconnected : () => { console . log ( "[CALLBACK] Bot disconnected" ); }, onBotReady : () => { console . log ( "[CALLBACK] Bot ready to chat!" ); }, }, }); try { // Below, we use a REST endpoint to fetch connection credentials for our // Daily Transport. Alternatively, you could provide those credentials // directly to `connect()`. await pcClient . connect ({ endpoint: "https://your-connect-end-point-here/connect" , }); } catch ( e ) { console . error ( e . message ); } // Events (alternative approach to constructor-provided callbacks) pcClient . on ( RTVIEvent . Connected , () => { console . log ( "[EVENT] User connected" ); }); pcClient . on ( RTVIEvent . Disconnected , () => { console . log ( "[EVENT] User disconnected" ); }); Send custom messages and handle responses from your bot. This is useful for: Running server-side functionality Triggering specific bot actions Querying the server Responding to server requests javascript react Copy Ask AI import { PipecatClient } from "@pipecat-ai/client-js" ; const pcClient = new PipecatClient ({ transport: new DailyTransport (), callbacks: { onBotConnected : () => { pcClient . sendClientRequest ( 'get-language' ) . then (( response ) => { console . log ( "[CALLBACK] Bot using language:" , response ); if ( response !== preferredLanguage ) { pcClient . sendClientMessage ( 'set-language' , { language: preferredLanguage }); } }) . catch (( error ) => { console . error ( "[CALLBACK] Error getting language:" , error ); }); }, onServerMessage : ( message ) => { console . log ( "[CALLBACK] Received message from server:" , message ); }, }, }); await pcClient . connect ({ url: "https://your-daily-room-url" , token: "your-daily-token" }); β About RTVI Pipecatβs client SDKs implement the RTVI (Real-Time Voice and Video Inference) standard, an open specification for real-time AI inference. This means: Your code can work with any RTVI-compatible inference service You get battle-tested tooling for real-time multimedia handling You can easily set up development and testing environments β Next Steps Get started by trying out examples: Simple Chatbot Example Complete client-server example with both bot backend (Python) and frontend implementation (JS, React, React Native, iOS, and Android). More Examples Explore our full collection of example applications and implementations across different platforms and use cases. The RTVI Standard On this page Core Functionality Core Types PipecatClient Transport RTVIMessage Simple Usage Examples About RTVI Next Steps Assistant Responses are generated using AI and may contain mistakes.
|
client_migration-guide_cf5b62a9.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/migration-guide#migration-guides
|
2 |
+
Title: RTVIClient Migration Guide - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
RTVIClient Migration Guide - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation RTVIClient Migration Guide Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport This guide will cover the high-level changes between the old RTVIClient and the new PipecatClient . For specific code updates, refer to the platform-specific migration guides. β Key changes Client Name : The class name has changed from RTVIClient to PipecatClient . Pipeline Connection : Previously, the client expected a REST endpoint for gathering connection information as part of the constructor and was difficult to update or bipass. The new client expects connection information to be provided directly as part of the connect() method and can either be provided as an object with details your Transport requires or as an object with REST endpoint details for acquiring them. Actions and helpers : These have gone away in favor of some built-in methods for doing common actions like function call handling and appending to the llm context or in the case of custom actions, a simple set of methods for sending messages to the bot and handling responses. See registerFunctionCallHandler() , appendToContext() , sendClientMessage() , and sendClientRequest() for more details. Bot Configuration : This functionality as been removed as a security measure, so that a client cannot inherently have the ability to override a bot configuration and use credentials to its own whims. If you need the client to initialize or update the bot configuration, you will need to do so through an API call to your backend or building on top of the client-server messaging, which has now been made easier. The Client SDKs are currently in the process of making these changes. At this time, only the JavaScript and React libraries have been updated and released. Their corresponding documentation along with this top-level documentation has been updated to reflect the latest changes. The React Native, iOS, and Android SDKs are still in the process of being updated and their documentation will be updated and a migration guide provided once the new versions are released. If you have any questions or need assistance, please reach out to us on Discord . β Migration guides JavaScript Migrate your JavaScript client code to the new PipecatClient React Update your React components to use the new PipecatClient The RTVI Standard SDK Introduction On this page Key changes Migration guides Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_30d7d570.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#rtvi-message-types
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. β Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. β Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. β RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } β id string A unique identifier for the message, used to correlate requests and responses. β label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . β type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. β data unknown The payload of the message, which can be any data structure relevant to the message type. β RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a π€ or π emoji to denote whether the message is sent from the bot (π€) or client (π). β Connection Management β client-ready π Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient β library string required β library_version string β platform string β platform_version string β platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. β bot-ready π€ Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. Itβs structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. β disconnect-bot π Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined β error π€ Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. β Transcription β user-started-speaking π€ Emitted when the user begins speaking type : 'user-started-speaking' data : None β user-stopped-speaking π€ Emitted when the user stops speaking type : 'user-stopped-speaking' data : None β bot-started-speaking π€ Emitted when the bot begins speaking type : 'bot-started-speaking' data : None β bot-stopped-speaking π€ Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None β user-transcription π€ Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. β bot-transcription π€ Transcription of the botβs speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the botβs response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your botβs speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. β Client-Server Messaging β server-message π€ An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. β client-message π An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β server-response π€ An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β error-response π€ Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string β Advanced LLM Interactions β append-to-context π A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. β llm-function-call π€ A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. β llm-function-call-result π The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. β bot-llm-search-response π€ Search results from the LLMβs knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } β Service-Specific Insights β bot-llm-started π€ Indicates LLM processing has begun type : bot-llm-started data : None β bot-llm-stopped π€ Indicates LLM processing has completed type : bot-llm-stopped data : None β user-llm-text π€ Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The userβs input text to be processed by the LLM. β bot-llm-text π€ Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. β bot-tts-started π€ Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None β bot-tts-stopped π€ Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None β bot-tts-text π€ The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. β Metrics and Monitoring β metrics π€ Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready π bot-ready π€ disconnect-bot π error π€ Transcription user-started-speaking π€ user-stopped-speaking π€ bot-started-speaking π€ bot-stopped-speaking π€ user-transcription π€ bot-transcription π€ Client-Server Messaging server-message π€ client-message π server-response π€ error-response π€ Advanced LLM Interactions append-to-context π llm-function-call π€ llm-function-call-result π bot-llm-search-response π€ Service-Specific Insights bot-llm-started π€ bot-llm-stopped π€ user-llm-text π€ bot-llm-text π€ bot-tts-started π€ bot-tts-stopped π€ bot-tts-text π€ Metrics and Monitoring metrics π€ Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_63b2ad42.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#advanced-llm-interactions
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. β Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. β Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. β RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } β id string A unique identifier for the message, used to correlate requests and responses. β label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . β type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. β data unknown The payload of the message, which can be any data structure relevant to the message type. β RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a π€ or π emoji to denote whether the message is sent from the bot (π€) or client (π). β Connection Management β client-ready π Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient β library string required β library_version string β platform string β platform_version string β platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. β bot-ready π€ Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. Itβs structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. β disconnect-bot π Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined β error π€ Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. β Transcription β user-started-speaking π€ Emitted when the user begins speaking type : 'user-started-speaking' data : None β user-stopped-speaking π€ Emitted when the user stops speaking type : 'user-stopped-speaking' data : None β bot-started-speaking π€ Emitted when the bot begins speaking type : 'bot-started-speaking' data : None β bot-stopped-speaking π€ Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None β user-transcription π€ Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. β bot-transcription π€ Transcription of the botβs speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the botβs response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your botβs speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. β Client-Server Messaging β server-message π€ An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. β client-message π An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β server-response π€ An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β error-response π€ Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string β Advanced LLM Interactions β append-to-context π A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. β llm-function-call π€ A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. β llm-function-call-result π The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. β bot-llm-search-response π€ Search results from the LLMβs knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } β Service-Specific Insights β bot-llm-started π€ Indicates LLM processing has begun type : bot-llm-started data : None β bot-llm-stopped π€ Indicates LLM processing has completed type : bot-llm-stopped data : None β user-llm-text π€ Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The userβs input text to be processed by the LLM. β bot-llm-text π€ Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. β bot-tts-started π€ Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None β bot-tts-stopped π€ Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None β bot-tts-text π€ The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. β Metrics and Monitoring β metrics π€ Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready π bot-ready π€ disconnect-bot π error π€ Transcription user-started-speaking π€ user-stopped-speaking π€ bot-started-speaking π€ bot-stopped-speaking π€ user-transcription π€ bot-transcription π€ Client-Server Messaging server-message π€ client-message π server-response π€ error-response π€ Advanced LLM Interactions append-to-context π llm-function-call π€ llm-function-call-result π bot-llm-search-response π€ Service-Specific Insights bot-llm-started π€ bot-llm-stopped π€ user-llm-text π€ bot-llm-text π€ bot-tts-started π€ bot-tts-stopped π€ bot-tts-text π€ Metrics and Monitoring metrics π€ Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_7786a985.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#disconnect-bot-%F0%9F%8F%84
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. β Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. β Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. β RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } β id string A unique identifier for the message, used to correlate requests and responses. β label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . β type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. β data unknown The payload of the message, which can be any data structure relevant to the message type. β RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a π€ or π emoji to denote whether the message is sent from the bot (π€) or client (π). β Connection Management β client-ready π Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient β library string required β library_version string β platform string β platform_version string β platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. β bot-ready π€ Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. Itβs structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. β disconnect-bot π Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined β error π€ Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. β Transcription β user-started-speaking π€ Emitted when the user begins speaking type : 'user-started-speaking' data : None β user-stopped-speaking π€ Emitted when the user stops speaking type : 'user-stopped-speaking' data : None β bot-started-speaking π€ Emitted when the bot begins speaking type : 'bot-started-speaking' data : None β bot-stopped-speaking π€ Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None β user-transcription π€ Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. β bot-transcription π€ Transcription of the botβs speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the botβs response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your botβs speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. β Client-Server Messaging β server-message π€ An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. β client-message π An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β server-response π€ An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β error-response π€ Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string β Advanced LLM Interactions β append-to-context π A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. β llm-function-call π€ A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. β llm-function-call-result π The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. β bot-llm-search-response π€ Search results from the LLMβs knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } β Service-Specific Insights β bot-llm-started π€ Indicates LLM processing has begun type : bot-llm-started data : None β bot-llm-stopped π€ Indicates LLM processing has completed type : bot-llm-stopped data : None β user-llm-text π€ Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The userβs input text to be processed by the LLM. β bot-llm-text π€ Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. β bot-tts-started π€ Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None β bot-tts-stopped π€ Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None β bot-tts-text π€ The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. β Metrics and Monitoring β metrics π€ Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready π bot-ready π€ disconnect-bot π error π€ Transcription user-started-speaking π€ user-stopped-speaking π€ bot-started-speaking π€ bot-stopped-speaking π€ user-transcription π€ bot-transcription π€ Client-Server Messaging server-message π€ client-message π server-response π€ error-response π€ Advanced LLM Interactions append-to-context π llm-function-call π€ llm-function-call-result π bot-llm-search-response π€ Service-Specific Insights bot-llm-started π€ bot-llm-stopped π€ user-llm-text π€ bot-llm-text π€ bot-tts-started π€ bot-tts-stopped π€ bot-tts-text π€ Metrics and Monitoring metrics π€ Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_d5f72539.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#param-id
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. β Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. β Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. β RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } β id string A unique identifier for the message, used to correlate requests and responses. β label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . β type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. β data unknown The payload of the message, which can be any data structure relevant to the message type. β RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a π€ or π emoji to denote whether the message is sent from the bot (π€) or client (π). β Connection Management β client-ready π Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient β library string required β library_version string β platform string β platform_version string β platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. β bot-ready π€ Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. Itβs structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. β disconnect-bot π Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined β error π€ Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. β Transcription β user-started-speaking π€ Emitted when the user begins speaking type : 'user-started-speaking' data : None β user-stopped-speaking π€ Emitted when the user stops speaking type : 'user-stopped-speaking' data : None β bot-started-speaking π€ Emitted when the bot begins speaking type : 'bot-started-speaking' data : None β bot-stopped-speaking π€ Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None β user-transcription π€ Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. β bot-transcription π€ Transcription of the botβs speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the botβs response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your botβs speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. β Client-Server Messaging β server-message π€ An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. β client-message π An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β server-response π€ An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β error-response π€ Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string β Advanced LLM Interactions β append-to-context π A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. β llm-function-call π€ A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. β llm-function-call-result π The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. β bot-llm-search-response π€ Search results from the LLMβs knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } β Service-Specific Insights β bot-llm-started π€ Indicates LLM processing has begun type : bot-llm-started data : None β bot-llm-stopped π€ Indicates LLM processing has completed type : bot-llm-stopped data : None β user-llm-text π€ Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The userβs input text to be processed by the LLM. β bot-llm-text π€ Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. β bot-tts-started π€ Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None β bot-tts-stopped π€ Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None β bot-tts-text π€ The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. β Metrics and Monitoring β metrics π€ Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready π bot-ready π€ disconnect-bot π error π€ Transcription user-started-speaking π€ user-stopped-speaking π€ bot-started-speaking π€ bot-stopped-speaking π€ user-transcription π€ bot-transcription π€ Client-Server Messaging server-message π€ client-message π server-response π€ error-response π€ Advanced LLM Interactions append-to-context π llm-function-call π€ llm-function-call-result π bot-llm-search-response π€ Service-Specific Insights bot-llm-started π€ bot-llm-stopped π€ user-llm-text π€ bot-llm-text π€ bot-tts-started π€ bot-tts-stopped π€ bot-tts-text π€ Metrics and Monitoring metrics π€ Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_d8e92ae0.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#param-library-version
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. β Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. β Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. β RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } β id string A unique identifier for the message, used to correlate requests and responses. β label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . β type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. β data unknown The payload of the message, which can be any data structure relevant to the message type. β RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a π€ or π emoji to denote whether the message is sent from the bot (π€) or client (π). β Connection Management β client-ready π Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient β library string required β library_version string β platform string β platform_version string β platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. β bot-ready π€ Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. Itβs structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. β disconnect-bot π Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined β error π€ Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. β Transcription β user-started-speaking π€ Emitted when the user begins speaking type : 'user-started-speaking' data : None β user-stopped-speaking π€ Emitted when the user stops speaking type : 'user-stopped-speaking' data : None β bot-started-speaking π€ Emitted when the bot begins speaking type : 'bot-started-speaking' data : None β bot-stopped-speaking π€ Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None β user-transcription π€ Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. β bot-transcription π€ Transcription of the botβs speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the botβs response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your botβs speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. β Client-Server Messaging β server-message π€ An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. β client-message π An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β server-response π€ An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β error-response π€ Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string β Advanced LLM Interactions β append-to-context π A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. β llm-function-call π€ A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. β llm-function-call-result π The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. β bot-llm-search-response π€ Search results from the LLMβs knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } β Service-Specific Insights β bot-llm-started π€ Indicates LLM processing has begun type : bot-llm-started data : None β bot-llm-stopped π€ Indicates LLM processing has completed type : bot-llm-stopped data : None β user-llm-text π€ Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The userβs input text to be processed by the LLM. β bot-llm-text π€ Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. β bot-tts-started π€ Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None β bot-tts-stopped π€ Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None β bot-tts-text π€ The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. β Metrics and Monitoring β metrics π€ Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready π bot-ready π€ disconnect-bot π error π€ Transcription user-started-speaking π€ user-stopped-speaking π€ bot-started-speaking π€ bot-stopped-speaking π€ user-transcription π€ bot-transcription π€ Client-Server Messaging server-message π€ client-message π server-response π€ error-response π€ Advanced LLM Interactions append-to-context π llm-function-call π€ llm-function-call-result π bot-llm-search-response π€ Service-Specific Insights bot-llm-started π€ bot-llm-stopped π€ user-llm-text π€ bot-llm-text π€ bot-tts-started π€ bot-tts-stopped π€ bot-tts-text π€ Metrics and Monitoring metrics π€ Assistant Responses are generated using AI and may contain mistakes.
|
client_rtvi-standard_ef4532ad.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/client/rtvi-standard#user-transcription-%F0%9F%A4%96
|
2 |
+
Title: The RTVI Standard - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
The RTVI Standard - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation The RTVI Standard Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Client SDKs The RTVI Standard RTVIClient Migration Guide Javascript SDK SDK Introduction API Reference Transport packages React SDK SDK Introduction API Reference React Native SDK SDK Introduction API Reference iOS SDK SDK Introduction API Reference Transport packages Android SDK SDK Introduction API Reference Transport packages C++ SDK SDK Introduction Daily WebRTC Transport The RTVI (Real-Time Voice and Video Inference) standard defines a set of message types and structures sent between clients and servers. It is designed to facilitate real-time interactions between clients and AI applications that require voice, video, and text communication. It provides a consistent framework for building applications that can communicate with AI models and the backends running those models in real-time. This page documents version 1.0 of the RTVI standard, released in June 2025. β Key Features Connection Management RTVI provides a flexible connection model that allows clients to connect to AI services and coordinate state. Transcriptions The standard includes built-in support for real-time transcription of audio streams. Client-Server Messaging The standard defines a messaging protocol for sending and receiving messages between clients and servers, allowing for efficient communication of requests and responses. Advanced LLM Interactions The standard supports advanced interactions with large language models (LLMs), including context management, function call handline, and search results. Service-Specific Insights RTVI supports events to provide insight into the input/output and state for typical services that exist in speech-to-speech workflows. Metrics and Monitoring RTVI provides mechanisms for collecting metrics and monitoring the performance of server-side services. β Terms Client : The front-end application or user interface that interacts with the RTVI server. Server : The backend-end service that runs the AI framework and processes requests from the client. User : The end user interacting with the client application. Bot : The AI interacting with the user, technically an amalgamation of a large language model (LLM) and a text-to-speech (TTS) service. β RTVI Message Format The messages defined as part of the RTVI protocol adhere to the following format: Copy Ask AI { "id" : string , "label" : "rtvi-ai" , "type" : string , "data" : unknown } β id string A unique identifier for the message, used to correlate requests and responses. β label string default: "rtvi-ai" required A label that identifies this message as an RTVI message. This field is required and should always be set to 'rtvi-ai' . β type string required The type of message being sent. This field is required and should be set to one of the predefined RTVI message types listed below. β data unknown The payload of the message, which can be any data structure relevant to the message type. β RTVI Message Types Following the above format, this section describes the various message types defined by the RTVI standard. Each message type has a specific purpose and structure, allowing for clear communication between clients and servers. Each message type below includes either a π€ or π emoji to denote whether the message is sent from the bot (π€) or client (π). β Connection Management β client-ready π Indicates that the client is ready to receive messages and interact with the server. Typically sent after the transport media channels have connected. type : 'client-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : AboutClient Object An object containing information about the client, such as its rtvi-version, client library, and any other relevant metadata. The AboutClient object follows this structure: Show AboutClient β library string required β library_version string β platform string β platform_version string β platform_details any Any platform-specific details that may be relevant to the server. This could include information about the browser, operating system, or any other environment-specific data needed by the server. This field is optional and open-ended, so please be mindful of the data you include here and any security concerns that may arise from exposing sensitive or personal-identifiable information. β bot-ready π€ Indicates that the bot is ready to receive messages and interact with the client. Typically send after the transport media channels have connected. type : 'bot-ready' data : version : string The version of the RTVI standard being used. This is useful for ensuring compatibility between client and server implementations. about : any (Optional) An object containing information about the server or bot. Itβs structure and value are both undefined by default. This provides flexibility to include any relevant metadata your client may need to know about the server at connection time, without any built-in security concerns. Please be mindful of the data you include here and any security concerns that may arise from exposing sensitive information. β disconnect-bot π Indicates that the client wishes to disconnect from the bot. Typically used when the client is shutting down or no longer needs to interact with the bot. Note: Disconnets should happen automatically when either the client or bot disconnects from the transport, so this message is intended for the case where a client may want to remain connected to the transport but no longer wishes to interact with the bot. type : 'disconnect-bot' data : undefined β error π€ Indicates an error occurred during bot initialization or runtime. type : 'error' data : message : string Description of the error. fatal : boolean Indicates if the error is fatal to the session. β Transcription β user-started-speaking π€ Emitted when the user begins speaking type : 'user-started-speaking' data : None β user-stopped-speaking π€ Emitted when the user stops speaking type : 'user-stopped-speaking' data : None β bot-started-speaking π€ Emitted when the bot begins speaking type : 'bot-started-speaking' data : None β bot-stopped-speaking π€ Emitted when the bot stops speaking type : 'bot-stopped-speaking' data : None β user-transcription π€ Real-time transcription of user speech, including both partial and final results. type : 'user-transcription' data : text : string The transcribed text of the user. final : boolean Indicates if this is a final transcription or a partial result. timestamp : string The timestamp when the transcription was generated. user_id : string Identifier for the user who spoke. β bot-transcription π€ Transcription of the botβs speech. Note: This protocol currently does not match the user transcription format to support real-time timestamping for bot transcriptions. Rather, the event is typically sent for each sentence of the botβs response. This difference is currently due to limitations in TTS services which mostly do not support (or support well), accurate timing information. If/when this changes, this protocol may be updated to include the necessary timing information. For now, if you want to attempt real-time transcription to match your botβs speaking, you can try using the bot-tts-text message type. type : 'bot-transcription' data : text : string The transcribed text from the bot, typically aggregated at a per-sentence level. β Client-Server Messaging β server-message π€ An arbitrary message sent from the server to the client. This can be used for custom interactions or commands. This message may be coupled with the client-message message type to handle responses from the client. type : 'server-message' data : any The data can be any JSON-serializable object, formatted according to your own specifications. β client-message π An arbitrary message sent from the client to the server. This can be used for custom interactions or commands. This message may be coupled with the server-response message type to handle responses from the server. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β server-response π€ An message sent from the server to the client in response to a client-message . IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'client-message' data : t : string d : unknown (optional) The data payload should contain a t field indicating the type of message and an optional d field containing any custom, corresponding data needed for the message. β error-response π€ Error response to a specific client message. IMPORTANT : The id should match the id of the original client-message to correlate the response with the request. type : 'error-response' data : error : string β Advanced LLM Interactions β append-to-context π A message sent from the client to the server to append data to the context of the current llm conversation. This is useful for providing text-based content for the user or augmenting the context for the assistant. type : 'append-to-context' data : role : "user" | "assistant" The role the context should be appended to. Currently only supports "user" and "assistant" . content : unknown The content to append to the context. This can be any data structure the llm understand. run_immediately : boolean (optional) Indicates whether the context should be run immediately after appending. Defaults to false . If set to false , the context will be appended but not executed until the next llm run. β llm-function-call π€ A function call request from the LLM, sent from the bot to the client. Note that for most cases, an LLM function call will be handled completely server-side. However, in the event that the call requires input from the client or the client needs to be aware of the function call, this message/response schema is required. type : 'llm-function-call' data : function_name : string Name of the function to be called. tool_call_id : string Unique identifier for this function call. args : Record<string, unknown> Arguments to be passed to the function. β llm-function-call-result π The result of the function call requested by the LLM, returned from the client. type : 'llm-function-call-result' data : function_name : string Name of the called function. tool_call_id : string Identifier matching the original function call. args : Record<string, unknown> Arguments that were passed to the function. result : Record<string, unknown> | string The result returned by the function. β bot-llm-search-response π€ Search results from the LLMβs knowledge base. Currently, Google Gemini is the only LLM that supports built-in search. However, we expect other LLMs to follow suite, which is why this message type is defined as part of the RTVI standard. As more LLMs add support for this feature, the format of this message type may evolve to accommodate discrepancies. type : 'bot-llm-search-response' data : search_result : string (optional) Raw search result text. rendered_content : string (optional) Formatted version of the search results. origins : Array<Origin Object> Source information and confidence scores for search results. The Origin Object follows this structure: Copy Ask AI { "site_uri" : string (optional) , "site_title" : string (optional) , "results" : Array< { "text" : string , "confidence" : number [] } > } Example: Copy Ask AI "id" : undefined "label" : "rtvi-ai" "type" : "bot-llm-search-response" "data" : { "origins" : [ { "results" : [ { "confidence" : [ 0.9881149530410768 ], "text" : "* Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm." }, { "confidence" : [ 0.9692034721374512 ], "ext" : "* A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm." } ], "site_title" : "vanderbilt.edu" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHwif83VK9KAzrbMSGSBsKwL8vWfSfC9pgEWYKmStHyqiRoV1oe8j1S0nbwRg_iWgqAr9wUkiegu3ATC8Ll-cuE-vpzwElRHiJ2KgRYcqnOQMoOeokVpWqi" }, { "results" : [ { "confidence" : [ 0.6554043292999268 ], "text" : "In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields." } ], "site_title" : "wikipedia.org" , "site_uri" : "https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQESbF-ijx78QbaglrhflHCUWdPTD4M6tYOQigW5hgsHNctRlAHu9ktfPmJx7DfoP5QicE0y-OQY1cRl9w4Id0btiFgLYSKIm2-SPtOHXeNrAlgA7mBnclaGrD7rgnLIbrjl8DgUEJrrvT0CKzuo" }], "rendered_content" : "<style> \n .container ... </div> \n </div> \n " , "search_result" : "Several events are happening at Vanderbilt University: \n\n * Juneteenth: A Freedom Celebration is scheduled for June 18th from 12:00 pm to 2:00 pm. \n * A Juneteenth celebration at Fort Negley Park will take place on June 19th from 5:00 pm to 9:30 pm. \n\n In addition to these events, Vanderbilt University is a large research institution with ongoing activities across many fields. For the most recent news, you should check Vanderbilt's official news website. \n " } β Service-Specific Insights β bot-llm-started π€ Indicates LLM processing has begun type : bot-llm-started data : None β bot-llm-stopped π€ Indicates LLM processing has completed type : bot-llm-stopped data : None β user-llm-text π€ Aggregated user input text that is sent to the LLM. type : 'user-llm-text' data : text : string The userβs input text to be processed by the LLM. β bot-llm-text π€ Individual tokens streamed from the LLM as they are generated. type : 'bot-llm-text' data : text : string The token text from the LLM. β bot-tts-started π€ Indicates text-to-speech (TTS) processing has begun. type : 'bot-tts-started' data : None β bot-tts-stopped π€ Indicates text-to-speech (TTS) processing has completed. type : 'bot-tts-stopped' data : None β bot-tts-text π€ The per-token text output of the text-to-speech (TTS) service (what the TTS actually says). type : 'bot-tts-text' data : text : string The text representation of the generated bot speech. β Metrics and Monitoring β metrics π€ Performance metrics for various processing stages and services. Each message will contain entries for one or more of the metrics types: processing , ttfb , characters . type : 'metrics' data : processing : [See Below] (optional) Processing time metrics. ttfb : [See Below] (optional) Time to first byte metrics. characters : [See Below] (optional) Character processing metrics. For each metric type, the data structure is an array of objects with the following structure: processor : string The name of the processor or service that generated the metric. value : number The value of the metric, typically in milliseconds or character count. model : string (optional) The model of the service that generated the metric, if applicable. Example: Copy Ask AI { "type" : "metrics" , "data" : { "processing" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.0005140304565429688 } ], "ttfb" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 0.1573178768157959 } ], "characters" : [ { "model" : "eleven_flash_v2_5" , "processor" : "ElevenLabsTTSService#0" , "value" : 43 } ] } } Client SDKs RTVIClient Migration Guide On this page Key Features Terms RTVI Message Format RTVI Message Types Connection Management client-ready π bot-ready π€ disconnect-bot π error π€ Transcription user-started-speaking π€ user-stopped-speaking π€ bot-started-speaking π€ bot-stopped-speaking π€ user-transcription π€ bot-transcription π€ Client-Server Messaging server-message π€ client-message π server-response π€ error-response π€ Advanced LLM Interactions append-to-context π llm-function-call π€ llm-function-call-result π bot-llm-search-response π€ Service-Specific Insights bot-llm-started π€ bot-llm-stopped π€ user-llm-text π€ bot-llm-text π€ bot-tts-started π€ bot-tts-stopped π€ bot-tts-text π€ Metrics and Monitoring metrics π€ Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_053a92dd.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-daily-api-key
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL β created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_19bd418b.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-room-url-3
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL β created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_33ab0889.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-max-participants
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL οΏ½οΏ½οΏ½ created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_41f93594.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-enable-dialout
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL β created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_5ded7034.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#create-room
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL β created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|
daily_rest-helpers_6423975d.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
URL: https://docs.pipecat.ai/server/utilities/daily/rest-helpers#param-privacy-1
|
2 |
+
Title: Daily REST Helper - Pipecat
|
3 |
+
==================================================
|
4 |
+
|
5 |
+
Daily REST Helper - Pipecat Pipecat home page Search... β K Ask AI Search... Navigation Service Utilities Daily REST Helper Getting Started Guides Server APIs Client SDKs Community GitHub Examples Changelog Server API Reference API Reference Reference docs Services Supported Services Transport Serializers Speech-to-Text LLM Text-to-Speech Speech-to-Speech Image Generation Video Memory Vision Analytics & Monitoring Utilities Advanced Frame Processors Audio Processing Frame Filters Metrics and Telemetry MCP Observers Service Utilities Daily REST Helper Smart Turn Detection Task Handling and Monitoring Telephony Text Aggregators and Filters User and Bot Transcriptions User Interruptions Frameworks RTVI Pipecat Flows Pipeline PipelineParams PipelineTask Pipeline Idle Detection Pipeline Heartbeats ParallelPipeline Daily REST API Documentation For complete Daily REST API reference and additional details β Classes β DailyRoomSipParams Configuration for SIP (Session Initiation Protocol) parameters. β display_name string default: "sw-sip-dialin" Display name for the SIP endpoint β video boolean default: false Whether video is enabled for SIP β sip_mode string default: "dial-in" SIP connection mode β num_endpoints integer default: 1 Number of SIP endpoints Copy Ask AI from pipecat.transports.services.helpers.daily_rest import DailyRoomSipParams sip_params = DailyRoomSipParams( display_name = "conference-line" , video = True , num_endpoints = 2 ) β RecordingsBucketConfig Configuration for storing Daily recordings in a custom S3 bucket. β bucket_name string required Name of the S3 bucket for storing recordings β bucket_region string required AWS region where the S3 bucket is located β assume_role_arn string required ARN of the IAM role to assume for S3 access β allow_api_access boolean default: false Whether to allow API access to the recordings Copy Ask AI from pipecat.transports.services.helpers.daily_rest import RecordingsBucketConfig bucket_config = RecordingsBucketConfig( bucket_name = "my-recordings-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRecordingsRole" , allow_api_access = True ) β DailyRoomProperties Properties that configure a Daily roomβs behavior and features. β exp float Room expiration time as Unix timestamp (e.g., time.time() + 300 for 5 minutes) β enable_chat boolean default: false Whether chat is enabled in the room β enable_prejoin_ui boolean default: false Whether the prejoin lobby UI is enabled β enable_emoji_reactions boolean default: false Whether emoji reactions are enabled β eject_at_room_exp boolean default: false Whether to eject participants when room expires β enable_dialout boolean Whether dial-out is enabled β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β geo string Geographic region for room β max_participants number Maximum number of participants allowed in the room β recordings_bucket RecordingsBucketConfig Configuration for custom S3 bucket recordings β sip DailyRoomSipParams SIP configuration parameters β sip_uri dict SIP URI configuration (returned by Daily) β start_video_off boolean default: false Whether the camera video is turned off by default The class also includes a sip_endpoint property that returns the SIP endpoint URI if available. Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomProperties, DailyRoomSipParams, RecordingsBucketConfig, ) properties = DailyRoomProperties( exp = time.time() + 3600 , # 1 hour from now enable_chat = True , enable_emoji_reactions = True , enable_recording = "cloud" , geo = "us-west" , max_participants = 50 , sip = DailyRoomSipParams( display_name = "conference" ), recordings_bucket = RecordingsBucketConfig( bucket_name = "my-bucket" , bucket_region = "us-west-2" , assume_role_arn = "arn:aws:iam::123456789012:role/DailyRole" ) ) # Access SIP endpoint if available if properties.sip_endpoint: print ( f "SIP endpoint: { properties.sip_endpoint } " ) β DailyRoomParams Parameters for creating a new Daily room. β name string Room name (if not provided, one will be generated) β privacy string default: "public" Room privacy setting (βprivateβ or βpublicβ) β properties DailyRoomProperties Room configuration properties Copy Ask AI import time from pipecat.transports.services.helpers.daily_rest import ( DailyRoomParams, DailyRoomProperties, ) params = DailyRoomParams( name = "team-meeting" , privacy = "private" , properties = DailyRoomProperties( enable_chat = True , exp = time.time() + 7200 # 2 hours from now ) ) β DailyRoomObject Response object representing a Daily room. β id string Unique room identifier β name string Room name β api_created boolean Whether the room was created via API β privacy string Room privacy setting β url string Complete room URL β created_at string Room creation timestamp in ISO 8601 format β config DailyRoomProperties Room configuration Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyRoomObject, DailyRoomProperties, ) # Example of what a DailyRoomObject looks like when received room = DailyRoomObject( id = "abc123" , name = "team-meeting" , api_created = True , privacy = "private" , url = "https://your-domain.daily.co/team-meeting" , created_at = "2024-01-20T10:00:00.000Z" , config = DailyRoomProperties( enable_chat = True , exp = 1705743600 ) ) β DailyMeetingTokenProperties Properties for configuring a Daily meeting token. β room_name string The room this token is valid for. If not set, token is valid for all rooms. β eject_at_token_exp boolean Whether to eject user when token expires β eject_after_elapsed integer Eject user after this many seconds β nbf integer βNot beforeβ timestamp - users cannot join before this time β exp integer Expiration timestamp - users cannot join after this time β is_owner boolean Whether token grants owner privileges β user_name string Userβs display name in the meeting β user_id string Unique identifier for the user (36 char limit) β enable_screenshare boolean Whether user can share their screen β start_video_off boolean Whether to join with video off β start_audio_off boolean Whether to join with audio off β enable_recording string Recording settings (βcloudβ, βlocalβ, or βraw-tracksβ) β enable_prejoin_ui boolean Whether to show prejoin UI β start_cloud_recording boolean Whether to start cloud recording when user joins β permissions dict Initial default permissions for a non-meeting-owner participant β DailyMeetingTokenParams Parameters for creating a Daily meeting token. β properties DailyMeetingTokenProperties Token configuration properties Copy Ask AI from pipecat.transports.services.helpers.daily_rest import ( DailyMeetingTokenParams, DailyMeetingTokenProperties, ) token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , enable_screenshare = True , start_video_off = True , permissions = { "canSend" : [ "video" , "audio" ]} ) ) β Initialize DailyRESTHelper Create a new instance of the Daily REST helper. β daily_api_key string required Your Daily API key β daily_api_url string default: "https://api.daily.co/v1" The Daily API base URL β aiohttp_session aiohttp.ClientSession required An aiohttp client session for making HTTP requests Copy Ask AI helper = DailyRESTHelper( daily_api_key = "your-api-key" , aiohttp_session = session ) β Create Room Creates a new Daily room with specified parameters. β params DailyRoomParams required Room configuration parameters including name, privacy, and properties Copy Ask AI # Create a room that expires in 1 hour params = DailyRoomParams( name = "my-room" , privacy = "private" , properties = DailyRoomProperties( exp = time.time() + 3600 , enable_chat = True ) ) room = await helper.create_room(params) print ( f "Room URL: { room.url } " ) β Get Room From URL Retrieves room information using a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room = await helper.get_room_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room.name } " ) β Get Token Generates a meeting token for a specific room. β room_url string required The complete Daily room URL β expiry_time float default: "3600" Token expiration time in seconds β eject_at_token_exp bool default: "False" Whether to eject user when token expires β owner bool default: "True" Whether the token should have owner privileges (overrides any setting in params) β params DailyMeetingTokenParams Additional token configuration. Note that room_name , exp , eject_at_token_exp , and is_owner will be set based on the other function parameters. Copy Ask AI # Basic token generation token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , # 30 minutes owner = True , eject_at_token_exp = True ) # Advanced token generation with additional properties token_params = DailyMeetingTokenParams( properties = DailyMeetingTokenProperties( user_name = "John Doe" , start_video_off = True ) ) token = await helper.get_token( room_url = "https://your-domain.daily.co/my-room" , expiry_time = 1800 , owner = False , eject_at_token_exp = True , params = token_params ) β Delete Room By URL Deletes a room using its URL. β room_url string required The complete Daily room URL Copy Ask AI success = await helper.delete_room_by_url( "https://your-domain.daily.co/my-room" ) if success: print ( "Room deleted successfully" ) β Delete Room By Name Deletes a room using its name. β room_name string required The name of the Daily room Copy Ask AI success = await helper.delete_room_by_name( "my-room" ) if success: print ( "Room deleted successfully" ) β Get Name From URL Extracts the room name from a Daily room URL. β room_url string required The complete Daily room URL Copy Ask AI room_name = helper.get_name_from_url( "https://your-domain.daily.co/my-room" ) print ( f "Room name: { room_name } " ) # Outputs: "my-room" Turn Tracking Observer Smart Turn Overview On this page Classes DailyRoomSipParams RecordingsBucketConfig DailyRoomProperties DailyRoomParams DailyRoomObject DailyMeetingTokenProperties DailyMeetingTokenParams Initialize DailyRESTHelper Create Room Get Room From URL Get Token Delete Room By URL Delete Room By Name Get Name From URL Assistant Responses are generated using AI and may contain mistakes.
|